1.安装BeautifulSoup
pip install beautifulsoup4
2.读取htm文件
htmcontent = open(path,'r').read() soup = BeautifulSoup(htmcontent) htmcontent = soup.get_text()