zoukankan      html  css  js  c++  java
  • Python爬虫小白入门(十二)Python 爬虫 – 根据id与class查找标签

    本章介绍怎么根据id与class查找标签。假设有下面的HTML文档:

    <html>
    <head>
    <title>A simple example page</title>
    </head>
    <body>
    <div>
    <p class="inner-text first-item" id="first">
    First paragraph.
    </p>
    <p class="inner-text">
    Second paragraph.
    </p>
    </div>
    <p class="outer-text first-item" id="second">
    <b>
    First outer paragraph.
    </b>
    </p>
    <p class="outer-text">
    <b>
    Second outer paragraph.
    </b>
    </p>
    </body>
    </html>

    可以通过URL https://kevinhwu.github.io/demo/python-scraping/simple2.html 访问上面的文档。让我们先下载页面并创建一个BeautifulSoup对象:

    import requests
    from bs4 import BeautifulSoup
    
    page = requests.get("https://kevinhwu.github.io/demo/python-scraping/simple2.html")
    soup = BeautifulSoup(page.content, 'html.parser')

    根据class查找标签

    根据id与class查找标签,使用的仍旧是find_all方法。下面的例子,查找类是outer-textp标签:

    soup.find_all('p', class_='outer-text')

    输出

    [<p class="outer-text first-item" id="second">
    <b>
    First outer paragraph.
    </b>
    </p>, <p class="outer-text">
    <b>
    Second outer paragraph.
    </b>
    </p>]

    在下面的例子中,查找任何类是outer-text的标签:

    soup.find_all(class_="outer-text")

    输出

    [<p class="outer-text first-item" id="second">
    <b>
    First outer paragraph.
    </b>
    </p>, <p class="outer-text">
    <b>
    Second outer paragraph.
    </b>
    </p>]

    根据id查找标签

    另外,也可以通过id查找标签:

    [<p class="inner-text first-item" id="first">
    First paragraph.
    </p>]

    输出

    [<p class="inner-text first-item" id="first">
    First paragraph.
    </p>]
  • 相关阅读:
    22.112.leetcode_path_sum
    21.leetcode111_minimum_depth_of_binary_tree
    20.leetcode110_balanced_binary_tree
    19.leetcode108_convert_sorted_array_to_binary_search_tree
    论文阅读 | RoBERTa: A Robustly Optimized BERT Pretraining Approach
    CheckList:ACL 2020 Best Paper
    激活函数综述
    盘点深度学习中的损失函数
    逻辑回归
    机器学习之参数估计
  • 原文地址:https://www.cnblogs.com/huanghanyu/p/13175821.html
Copyright © 2011-2022 走看看