python - How to extract the specific contents from the html doc I grasp -

- August 15, 2011

i'm new on beautifulsoup , urllib.
want read data pm25.in,which website offering atmospheric quality data of china.

my attempt

### set specific city name, token public key free use city = 'zhuhai' html_doc = urllib.urlopen("http://www.pm25.in/api/querys/co.json?         city=zhuhai&token=5j1znbvasnsf5xqynqyq").read().decode('utf-8') soup = beautifulsoup(html_doc)

result

<html>    <body>    <p>      [{"aqi":29,"area":"珠海","co":0.591,"co_24h":0.955,"position_name":"吉大","primary_pollutant":null,"quality":"优","station_code":"1367a","time_point":"2016-01-07t20:00:00z"},{"aqi":51,"area":"珠海","co":0.913,"co_24h":1.059,"position_name":"前山","primary_pollutant":"颗粒物(pm10)","quality":"良","station_code":"1368a","time_point":"2016-01-07t20:00:00z"},{"aqi":35,"area":"珠海","co":0.699,"co_24h":0.885,"position_name":"唐家","primary_pollutant":null,"quality":"优","station_code":"1369a","time_point":"2016-01-07t20:00:00z"},{"aqi":52,"area":"珠海","co":0.874,"co_24h":0.949,"position_name":"斗门","primary_pollutant":"颗粒物(pm10)","quality":"良","station_code":"1370a","time_point":"2016-01-07t20:00:00z"},{"aqi":67,"area":"珠海","co":0.769,"co_24h":0.962,"position_name":null,"primary_pollutant":"臭氧8小时","quality":"良","station_code":null,"time_point":"2016-01-07t20:00:00z"}]    </p>  </body>    </html>

my target

here dataframe edit manually template.

http://i4.tietuku.com/71f10394dbedd8d3.png

i want know how extract these useful data raw html_doc
have tried soup.find_all(), don't know how set paraments achieve that.

you can use json module convert soup list:

import json ... l = soup.p.get_text()  # find p tag , extract text result = json.loads(l) # convert text python list

now have list result can manipulate data way need.

Search This Blog

Stadnd

python - How to extract the specific contents from the html doc I grasp -

my attempt

result

my target

Comments

Post a Comment

Popular posts from this blog

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -

Capture and play voice with Asterisk ARI -

c++ - Can not find the "fiostream.h" file -