starts-with用法,方便获取多个类似标签内容
1 from lxml import etree 2 3 html = ''' 4 <li class="tag-1"> 需要的内容1</li> 5 <li class="tag-1"> 需要的内容2</li> 6 <li class="tag-1"> 需要的内容3</li> 7 ''' 8 9 selector = etree.HTML(html) 10 contents = selector.xpath('//li[starts-with(@class,"tag")]/text()') 11 for content in contents: 12 print(content)
string(.)用法:当遇到标签套标签情况,可以通过string(.)拿到标签内所有文本内容
1 from lxml import etree 2 3 html = ''' 4 <div class="red"> 需要的内容1 5 <h1>需要的内容2</h1> 6 </div> 7 ''' 8 9 selector = etree.HTML(html) 10 content1 = selector.xpath('//div[@class="red"]')[0] 11 content2 = content1.xpath('string(.)') 12 print(content2)