最近对爬虫有点着迷,
在用bs4模块时,遇到报错:UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 9: illegal multibyte sequence
bs4获取本地文件内容
from bs4 import BeautifulSoup soup = BeautifulSoup(open('a.html'), 'html.parser') print(soup.prettify()) # 打印本地文件的内容
其中,a.html的内容为:
<div>大家好</div> <p>你好啊</p>
运行报错
上面是字符流的问题
from bs4 import BeautifulSoup soup = BeautifulSoup(open('a.html', 'rb'), 'html.parser') print(soup.prettify()) # 打印本地文件的内容
运行结果: