bs4 中 BeautifulSoup 常用命令: BeautifulSoup,prettify,find,find_all,get,get_text
示例:
scenery.html如下:

1 <html lang="en"> 2 <head> 3 <meta charset="UTF-8"> 4 <title>武汉旅游景点</title> 5 <meta name="description" content="武汉旅游景点 精简版" /> 6 <meta name="authot" content="hstking" > 7 </head> 8 <body> 9 <div id="content"> 10 <div class="title"> 11 <h3>武汉景点</h3> 12 </div> 13 <ul class="table"> 14 <li>景点<a>门票价格</a></li> 15 </ul> 16 <ul class="content"> 17 <li nu="1">东湖 <a class="price">60</a></li> 18 <li nu="2">磨山 <a class="price">60</a></li> 19 <li nu="3">欢乐谷 <a class="price">108</a></li> 20 <li nu="4">海洋世界 <a class="price">150</a></li> 21 <li nu="5">水上乐园 <a class="price">150</a></li> 22 </ul> 23 </div> 24 </body> 25 </html>
操作代码如下:

1 from bs4 import BeautifulSoup 2 soup = BeautifulSoup(open('scenery.html',encoding='utf-8'),'lxml') #常用lxml解析 3 print("soup类型: ",end="");print(type(soup)) 4 5 l = soup.find('li') #find()可以指定属性attrs,只返回一个结果 6 #l = soup.find('li',attrs={'nu':'3'}) 7 l_all = soup.find_all('li') #find_all()返回所有结果 8 #l_all = soup.find_all('li',attrs={'nu':'3'}) 9 print("find()的内容: ",end="");print(l) 10 print("find()返回类型: ",end="");print(type(l)) 11 print("find_all()的内容: ",end="");print(l_all) 12 print("find_all()返回类型: ",end="");print(type(l_all)) #find_all()返回类型可隐式变换为list 13 print("find_all()中第3个元素的值: ",end="");print(l_all[2]) 14 print("find_all()中元素类型: ",end="");print(type(l_all[2])) 15 print("find()可获取下级标签的内容: ",end="");print(l.find('a')) 16 print("get()可获取属性的值: ",end="");print(l.get('nu')) 17 print("get()返回类型: ",end="");print(type(l.get('nu'))) 18 print("get_text()可获取标签包含的字符串: ",end="");print(l.a.get_text()) 19 print("get_text()返回类型: ",end="");print(type(l.a.get_text()))
结果如下: