现在的API接口多为xml或json,json解析更简洁相对xml来说
以豆瓣的API接口为例,解析返回的json数据:
https://api.douban.com/v2/book/1220562
{"id":"1220562","alt":"http://book.douban.com/book/1220562","rating":{"max":10, "average":"7.0", "numRaters":282, "min":0},"author":[{"name":"片山恭一"}, {"name":"豫人"}],"alt_title":"","image":"http://img1.douban.com/spic/s1747553.jpg","title":"满月之夜白鲸现","mobile_link":"http://m.douban.com/book/subject/1220562/","summary":"那一年,是听莫扎特、钓鲈鱼和家庭破裂的一年。说到家庭破裂,母亲怪自己当初没有找到好男人,父亲则认为当时是被狐狸精迷住了眼,失常的是母亲,但出问题的是父亲……。","attrs":{"publisher":["青岛出版社"],"pubdate":["2005-01-01"],"author":["片山恭一", "豫人"],"price":["18.00元"],"title":["满月之夜白鲸现"],"binding":["平装(无盘)"],"translator":["豫人"],"pages":["180"]},"tags":[{"count":106, "name":"片山恭一"},{"count":50, "name":"日本"},{"count":42, "name":"日本文学"},{"count":30, "name":"满月之夜白鲸现"},{"count":28, "name":"小说"},{"count":10, "name":"爱情"},{"count":7, "name":"純愛"},{"count":6, "name":"外国文学"}]}
用python解析我们想要的数据如:id、rating里的max 、tags第一行的name值
import urllib2import jsonhtml = urllib2.urlopen(r'https://api.douban.com/v2/book/1220562')hjson = json.loads(html.read())print hjson['id']print hjson['rating']['max']print hjson['tags'][0]['name']
结果图: