BeautifulSoup简介
BeautifulSoup库是一个强大的python第三方库,它可以解析html进行解析,并提取信息。
安装BeautifulSoup
- 打开终端,输入命令:
pip3 install beautifulsoup4
BeautifulSoup库小测
- 小测用到的html页面地址:http://python123.io/ws/demo.html
- 查看它的源代码:
- 用request库获得源代码(存放在变量demo中):
>>> import requests
>>> r = requests.get("http://python123.io/ws/demo.html")
>>> r.text
'<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
</body></html>'
>>> demo = r.text
- 导入BeautifulSoup库
>>> from bs4 import BeautifulSoup
>>>
- 使用BeautifulSoup库解析html信息
>>> demo = r.text
>>> soup = BeautifulSoup(demo,'html.parser')
>>> print(soup.prettify)
<bound method Tag.prettify of <html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
</body></html>>
>>>
如何使用BeautifulSoup库?
- 代码框架:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>data</p>','html.parser')
- 其中BeautifulSoup的两个参数:
- 第一个代表我们要解析的
html
格式的信息。 - 第二个代表解析所使用到的解析器
- 第一个代表我们要解析的