Beautiful Soup的用法（五）：select的使用

zoukankan html css js c++ java

Beautiful Soup的用法（五）：select的使用
原文地址：http://www.bugingcode.com/blog/beautiful_soup_select.html

select 的功能跟find和find_all 一样用来选取特定的标签，它的选取规则依赖于css，我们把它叫做css选择器，如果之前有接触过jquery ，可以发现select的选取规则和jquery有点像。

通过标签名查找

在进行过滤时标签名不加任何修饰，如下：
```
from bs4 import BeautifulSoup 
import re 
 
html = """ 
<html><head><title>The Dormouse's story</title></head> 
<body> 
The Dormouse's story 
Once upon a time there were three little sisters; and their names were 
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and 
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; 
and they lived at the bottom of a well. 
</body> 
</html> 
""" 
 
soup = BeautifulSoup(html, "lxml") 
print soup.select('p')
```
返回的结果如下：
```
[The Dormouse's story, Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.]
```
通过结果可以看出，他返回的是一个数组，再继续看看数组里的元素是什么呢？
```
print type(soup.select('p')[0])
```
结果为：
```
<class 'bs4.element.Tag'>
```
清楚了返回的是bs4.element.Tag，这一点和find_all是一样的，select('p') 返回了所有标签名为p的tag。

通过类名和id进行查找

在进行过滤时类名前加点，id名前加 #
```
print soup.select('.title') 
print soup.select('#link2')
```
返回的结果为：
```
[The Dormouse's story]
[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
```
通过属性查找

如果不是id或者是类名，是不是就不能进行过滤了？如果可以，该如何来表达，
```
print soup.select('[href="http://example.com/lacie"]')
```
选择href 为http://example.com/lacie　的tag。

组合查找

组合查找可以分为两种，一种是在一个tag中进行两个条件的查找，一种是树状的查找一层一层之间的查找。

第一种情况，如下所示：
```
print soup.select('a#link2')
```
选择标签名为a，id为link2的tag。

输出的结果如下：
```
[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
```
另一种情况，如下：

从body开始，在body里面查找所有的 p，在所有的p 中查找标签名为a，id 为link2的tag，这样像树状一层一层的查找，在分析html结构是是非常常见的。层和层之间用空格分开。
```
print soup.select('body p a#link2')
```
结果如下：
```
[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
```
转载请标明来之：http://www.bugingcode.com/

更多教程：阿猫学编程
查看全文

相关阅读:
一个数组找出第k大的数（待补）
变动二叉树
 判断一个二叉树
 Redis的过期策略和内存淘汰机制
 sql连接详解
 http 请求和格式
 java基础知识
 分页信息
 持续集成之Jenkins自动部署war包到远程服务器
 no-sql数据库之redis

原文地址：https://www.cnblogs.com/bugingcode/p/8522161.html

Beautiful Soup的用法（五）：select的使用

通过标签名查找

通过类名和id进行查找

通过属性查找

组合查找