xpath获取标签对本身含内容, 获取html内容

通常使用xpath我们直接定位到标签后, 使用/text() 或 //text()来获取标签对之间的文本值,

但特殊情况下我们也需要获取标签本身含文本值, 操作如下:

文件为html, 标签对结构如下:

<table id='1h'> 
　　<tr>
　　 　 <td>Row value 1</td> 
　　　　<td>Row value 2</td> 
　　</tr>
</table>

代码如下:

from lxml import etree
import requests
from lxml.html import fromstring, tostring
url = "https://www.baidu.com/"

ret = requests.get(url, headers=headers)
code = ret.apparent_encoding  # 获取url对应的编码格式
ret.encoding = code
html = ret.text               # html文件内容即示例中的标签

tree = etree.HTML(html)
result = tree.xpath('//*[@id="lh"]')[0]

print('看结果这里', tostring(result, encoding=code).decode(code))

注: tostring()方法即可把通过xpath定位到的标签(含该标签)及其下的所有标签输出,
　　切记使用decode()方法来进行解码

查看全文

相关阅读:
三步搭建精准召回体系，挽回流失用户
 HMS Core Insights第二期直播预告——华为定位技术让你重拾方向感
 如何区分router.push跳转快应用的来源渠道
 华为预测服务的构建原理是什么？该如何训练模型？
HarmonyOS开发者日干货资料，奉上！
技术硬核、体验新颖……HarmonyOS开发者日最值得关注的点都在这里
 Js中Proxy对象
 迭代器模式
 ed命令
 百度实习生前端面试面经

原文地址：https://www.cnblogs.com/quzq/p/11032413.html