引用方法
from pyquery import PyQuery as pq
基本CSS选择器
from pyquery import PyQuery as pq html = ''' <div id="wrap"> <ul class="s_from"> asdasd <link href="http://asda.com">asdadasdad12312</link> <link href="http://asda1.com">asdadasdad12312</link> <link href="http://asda2.com">asdadasdad12312</link> </ul> </div> ''' doc = pq(html) print doc("#wrap .s_from link")
运行结果
<link href="http://asda.com">asdadasdad12312</link> <link href="http://asda1.com">asdadasdad12312</link> <link href="http://asda2.com">asdadasdad12312</link>
#是查找id的标签 .是查找class 的标签 link 是查找link 标签 中间的空格表示里层
遍历查找结果
from pyquery import PyQuery as pq html = ''' <div href="wrap"> hello nihao <ul class="s_from"> asdasd <link class='active1 a123' href="http://asda.com">asdadasdad12312</link> <link class='active2' href="http://asda1.com">asdadasdad12312</link> <link class='movie1' href="http://asda2.com">asdadasdad12312</link> </ul> </div> ''' doc = pq(html) its=doc("link").items() for it in its: print(it)
运行结果
<link class="active1 a123" href="http://asda.com">asdadasdad12312</link> <link class="active2" href="http://asda1.com">asdadasdad12312</link> <link class="movie1" href="http://asda2.com">asdadasdad12312</link>
获取属性信息
from pyquery import PyQuery as pq html = ''' <div href="wrap"> hello nihao <ul class="s_from"> asdasd <link class='active1 a123' href="http://asda.com">asdadasdad12312</link> <link class='active2' href="http://asda1.com">asdadasdad12312</link> <link class='movie1' href="http://asda2.com">asdadasdad12312</link> </ul> </div> ''' doc = pq(html) its=doc("link").items() for it in its: print(it.attr('href')) print(it.attr.href)
运行结果
http://asda.com http://asda.com http://asda1.com http://asda1.com http://asda2.com http://asda2.com
获取文本
from pyquery import PyQuery as pq html = ''' <div href="wrap"> hello nihao <ul class="s_from"> asdasd <link class='active1 a123' href="http://asda.com">asdadasdad12312</link> <link class='active2' href="http://asda1.com">asdadasdad12312</link> <link class='movie1' href="http://asda2.com">asdadasdad12312</link> </ul> </div> ''' doc = pq(html) its=doc("link").items() for it in its: print(it.text())
运行结果
asdadasdad12312
asdadasdad12312
asdadasdad12312