zoukankan      html  css  js  c++  java
  • 读BeautifulSoup官方文档之html树的搜索(2)

    除了find()和find_all(), 这里还提供了许多类似的方法我就细讲了, 参数和用法都差不多, 最后四个是next, previous是以.next/previous_element()来说的...

    Signature: find_parents(nameattrsstringlimit**kwargs)

    Signature: find_parent(nameattrsstring**kwargs)

    Signature: find_next_siblings(nameattrsstringlimit**kwargs)

    Signature: find_next_sibling(nameattrsstring**kwargs)

    Signature: find_previous_siblings(nameattrsstringlimit**kwargs)

    Signature: find_previous_sibling(nameattrsstring**kwargs)

    Signature: find_all_next(nameattrsstringlimit**kwargs)

    Signature: find_next(nameattrsstring**kwargs)

    Signature: find_all_previous(nameattrsstringlimit**kwargs)

    Signature: find_previous(nameattrsstring**kwargs)

    BeautifulSoup也提供CSS选择器, 用法大致与css选择器相同, 我css也只是入门级别, 这里就不多解释了... :

     1 soup.select("title")
     2 # [<title>The Dormouse's story</title>]
     3 
     4 soup.select("p nth-of-type(3)")
     5 # [<p class="story">...</p>]
     6 
     7 soup.select("body a")
     8 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
     9 #  <a class="sister" href="http://example.com/lacie"  id="link2">Lacie</a>,
    10 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    11 
    12 soup.select("html head title")
    13 # [<title>The Dormouse's story</title>]
    14 
    15 soup.select("head > title")
    16 # [<title>The Dormouse's story</title>]
    17 
    18 soup.select("p > a")
    19 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    20 #  <a class="sister" href="http://example.com/lacie"  id="link2">Lacie</a>,
    21 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    22 
    23 soup.select("p > a:nth-of-type(2)")
    24 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    25 
    26 soup.select("p > #link1")
    27 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    28 
    29 soup.select("body > a")
    30 # []
    31 
    32 #上面好像看懂了, 应该是 > 的话就是必须是孩子, 空格的话表示子孙.
    33 
    34 soup.select("#link1 ~ .sister")
    35 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
    36 #  <a class="sister" href="http://example.com/tillie"  id="link3">Tillie</a>]
    37 
    38 soup.select("#link1 + .sister")
    39 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    40 
    41 soup.select(".sister")
    42 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    43 #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
    44 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    45 
    46 soup.select("#link1")
    47 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    48 
    49 soup.select("a#link2")
    50 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    51 
    52 #下面好像是通过id寻找 :
    53 soup.select("#link1")
    54 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    55 
    56 soup.select("a#link2")
    57 # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
    58 
    59 #匹配任意一个
    60 soup.select(“#link1,#link2”) 
    61 # [<a class=”sister” href=”http://example.com/elsie” id=”link1”>Elsie</a>,
    62 # <a class=”sister” href=”http://example.com/lacie” id=”link2”>Lacie</a>]
    63 
    64 #当然可以用属性的值来匹配
    65 soup.select('a[href="http://example.com/elsie"]')
    66 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    67 
    68 soup.select('a[href^="http://example.com/"]')
    69 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    70 #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
    71 #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    72 
    73 soup.select('a[href$="tillie"]')
    74 # [<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    75 
    76 soup.select('a[href*=".com/el"]')
    77 # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
    78 
    79 #这个真看不懂
    80 multilingual_markup = """
    81  <p lang="en">Hello</p>
    82  <p lang="en-us">Howdy, y'all</p>
    83  <p lang="en-gb">Pip-pip, old fruit</p>
    84  <p lang="fr">Bonjour mes amis</p>
    85 """
    86 multilingual_soup = BeautifulSoup(multilingual_markup)
    87 multilingual_soup.select('p[lang|=en]')
    88 # [<p lang="en">Hello</p>,
    89 #  <p lang="en-us">Howdy, y'all</p>,
    90 #  <p lang="en-gb">Pip-pip, old fruit</p>]
    91 
    92 #选一个可以用select_one()
    93 soup.select_one(".sister")
    94 # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
  • 相关阅读:
    php大力力 [036节] 后台系统的登录页面界面做完啦
    php大力力 [035节] 先记录一些链接
    php大力力 [034节] 今天做出系统后台页面的界面啦
    php大力力 [033节] 随便看看:PHP程序员学习C++
    php大力力 [032节] php设计时候遇见麻烦:XQB50-H8268 进水电磁阀
    php大力力 [031节] php设计系统后台菜单和样式设计
    php大力力 [030节] php设计系统后台菜单
    php大力力 [029节] 做PHP项目如何下载js文件:使用腾讯浏览器把网上案例页面存储到本地
    php大力力 [027节] 被百度收录较好的几个视频网站示例
    SQL Server 创建触发器(trigger)---转载
  • 原文地址:https://www.cnblogs.com/nzhl/p/5591765.html
Copyright © 2011-2022 走看看