Python 爬虫去掉网页注释，去掉网页注释

在爬虫中，我们遇到了网页注释的问题，这些内容，第一，耗费内存资源，第二，在解析网页的时候，不易匹配出来信息。那么我们该如何去掉他们呢？？？

我们可以去使用正则去过滤掉他们

方法如下

result = "网页内容"
 
re_comment = re.compile('<!--[^>]*-->')
 
result_content = re_comment.sub('', result)

心得：用最简单的方法去解决复杂的问题

查看全文

相关阅读:
How to extract msu/msp/msi/exe files from the command line
Windbg and resources leaks in .NET applications 资源汇总
 [c# 20问] 3.String和string的区别
 [c# 20问] 2.如何转换XML文件
 [c# 20问] 1. 何时使用class与struct
安装配置BITS上传服务
 The J-Link hardware debugging Eclipse plug-in
swift material
SCLButton
ChatCell

Python 爬虫 去掉网页注释，去掉网页注释