zoukankan      html  css  js  c++  java
  • 7.01-beautiful_soup3

    # pip install beautifulsoup4
    
    from bs4 import BeautifulSoup
    
    html_doc = """
    <html><head>
    <title id="one">The Dormouse's story</title>
    </head>
    <body>
    <p class="story"><!--...--></p>
    <p class="title">
        p标签的内容
        <b>The Dormouse's story</b>
    </p>
    
    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
    """
    
    # 1.转类型 bs4.BeautifulSoup'
    soup = BeautifulSoup(html_doc, 'lxml')
    
    # 2.通用解析方法
    
    #  find--返回符合查询条件的 第一个标签对象
    result = soup.find(name="p")
    result = soup.find(attrs={"class": "title"})
    result = soup.find(text="Tillie")
    result = soup.find(
        name='p',
        attrs={"class": "story"},
    )
    
    # find_all--list(标签对象)
    result = soup.find_all('a')
    result = soup.find_all("a", limit=1)[0]
    result = soup.find_all(attrs={"class": "sister"})
    
    # select_one---css选择器
    result = soup.select_one('.sister')
    
    # select----css选择器---list
    result = soup.select('.sister')
    result = soup.select('#one')
    result = soup.select('head title')
    result = soup.select('title,.title')
    result = soup.select('a[id="link3"]')
    
    # 标签包裹的内容---list
    result = soup.select('.title')[0].get_text()
    
    
    # 标签的属性
    # result = soup.select('#link1')[0].get('href')
    print(result)
  • 相关阅读:
    21. 斐波那契数列
    22. 旋转数组的最小数字
    php图像处理链接
    FileOprSer.class.php(文件上传与下载类)
    SqlHelper.class.php+分页类方法
    file写入方式 和copy
    file 读取方式
    pytest+allure+jenkins 持续集成平台生成allure报告
    memcached cas操作
    php-fpm
  • 原文地址:https://www.cnblogs.com/hankleo/p/10626395.html
Copyright © 2011-2022 走看看