zoukankan html css js c++ java

python实战之爬取喜玛拉雅专辑信息

 1 import urllib.request
 2 import json
 3 from lxml import etree
 4 
 5 url='http://www.ximalaya.com/dq/8.ajax'
 6 headers ={    
 7     "User-Agent":'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
 8 }
 9 req = urllib.request.Request(url, headers= headers)
10 response = urllib.request.urlopen(req)
11 jsonobj=json.loads(response.read().decode('utf-8'))
12 html=jsonobj['html']
13 xml= etree.HTML(html)
14 nodeList = xml.xpath('//div[@class="discoverAlbum_item"]')
15 for node in nodeList:
16     img=node.xpath('.//img/@src')
17     print(img[0],end='	')
18     title=node.xpath('.//img/@alt')
19     print(title[0],end='	')
20     href = node.xpath('./a/@href')
21     print(href[0],end='	')

采用xpath解析html

有追求，才有动力！

向每一个软件工程师致敬！

by wujf

mail:921252375@qq.com

查看全文

相关阅读:
字段名删不掉
 刷新f5/ctrl+f5
大量数据模拟
 sub_query join drupal7 view_query_alter
测试风格的代码
 csv/excel乱码
 window.location.reload(true)的异步现象
 扫描条形码
 yield %%% generator
batch example

原文地址：https://www.cnblogs.com/wujf/p/8056314.html