1.打开网址https://www.v2ex.com/,查看其源码。
2.打开PyCharm编译器,新建工程c3-11,新建python file,命名为v2ex.py,同时,新建file,命名为v2ex.html。将https://www.v2ex.com/源码复制到v2ex.html中。
v2ex.py代码如下:
from pyquery import PyQuery
if __name__ == '__main__':
q = PyQuery(open('v2ex.html').read())
print q('title').html()
运行结果如下:
Css选择器:
演示代码:
# -*- encoding=UTF-8 -*-
from pyquery import PyQuery
if __name__ == '__main__':
#<title>
q = PyQuery(open('v2ex.html').read())
print q('title').html()
#div class="a"
for each in q('div.inner>a').items():
if each.attr.href.find('tab')>0:
print 1,each.attr.href
#id=Tabs
for each in q('#Tabs>a').items():
print 2,each.attr.href
# 连级
for each in q('.cell>a[href^="/go/"]').items():
print 3,each.attr.href
for each in q('.cell a[href^="/go/"]').items():
print 4,each.attr.href
for each in q('span.item_title>a').items():
print 5,each.html()