python - How to extract the specific contents from the html doc I grasp -


i'm new on beautifulsoup , urllib.
want read data pm25.in,which website offering atmospheric quality data of china.

my attempt

### set specific city name, token public key free use city = 'zhuhai' html_doc = urllib.urlopen("http://www.pm25.in/api/querys/co.json?         city=zhuhai&token=5j1znbvasnsf5xqynqyq").read().decode('utf-8') soup = beautifulsoup(html_doc)     

result

<html>    <body>    <p>      [{"aqi":29,"area":"珠海","co":0.591,"co_24h":0.955,"position_name":"吉大","primary_pollutant":null,"quality":"优","station_code":"1367a","time_point":"2016-01-07t20:00:00z"},{"aqi":51,"area":"珠海","co":0.913,"co_24h":1.059,"position_name":"前山","primary_pollutant":"颗粒物(pm10)","quality":"良","station_code":"1368a","time_point":"2016-01-07t20:00:00z"},{"aqi":35,"area":"珠海","co":0.699,"co_24h":0.885,"position_name":"唐家","primary_pollutant":null,"quality":"优","station_code":"1369a","time_point":"2016-01-07t20:00:00z"},{"aqi":52,"area":"珠海","co":0.874,"co_24h":0.949,"position_name":"斗门","primary_pollutant":"颗粒物(pm10)","quality":"良","station_code":"1370a","time_point":"2016-01-07t20:00:00z"},{"aqi":67,"area":"珠海","co":0.769,"co_24h":0.962,"position_name":null,"primary_pollutant":"臭氧8小时","quality":"良","station_code":null,"time_point":"2016-01-07t20:00:00z"}]    </p>  </body>    </html>

my target

here dataframe edit manually template.

http://i4.tietuku.com/71f10394dbedd8d3.png

i want know how extract these useful data raw html_doc
have tried soup.find_all(), don't know how set paraments achieve that.

you can use json module convert soup list:

import json ... l = soup.p.get_text()  # find p tag , extract text result = json.loads(l) # convert text python list 

now have list result can manipulate data way need.


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -