zoukankan      html  css  js  c++  java
  • python框架---->BeautifulSoup的使用

      Beautiful Soup 是一个可以从HTML或XML文件中提取数据的Python库.它能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式。一个人至少拥有一个梦想,有一个理由去坚强。心若没有栖息的地方,到哪里都是在流浪。

    BeautifulSoup的安装使用

     window上安装方式:pip install beautifulsoup4。

    一、beautifulsoup4的简单使用

    from bs4 import BeautifulSoup
    import re
    
    html_doc = """
    <html><head><title>The Dormouse's story</title></head>
    <body>
    <p class="title"><b>The Dormouse's story</b></p>
    
    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
    
    <p class="story">...</p>
    """
    
    soup = BeautifulSoup(html_doc, 'html.parser')
    # 得到所有的a链接
    links = soup.findAll('a')
    for link in links:
        print(link.name, link['href'], link.get_text())
    
    # 得到特定的a链接
    link_node = soup.find('a', href='http://example.com/tillie')
    print(link_node.get_text(), link_node['id'])
    
    # 使用正则表达式
    link_re_node = soup.find('a', href=re.compile('cie'))
    print(link_re_node.get_text(), link_re_node['id'])
    
    # 根据class获取特定的内容
    p_node_class = soup.find('p', class_='title')
    print(p_node_class.get_text())

    运行的结果如下:

    a http://example.com/elsie Elsie
    a http://example.com/lacie Lacie
    a http://example.com/tillie Tillie
    Tillie link3
    Lacie link2
    The Dormouse's story

    友情链接

  • 相关阅读:
    LeetCode 382. Linked List Random Node
    LeetCode 398. Random Pick Index
    LeetCode 1002. Find Common Characters
    LeetCode 498. Diagonal Traverse
    LeetCode 825. Friends Of Appropriate Ages
    LeetCode 824. Goat Latin
    LeetCode 896. Monotonic Array
    LeetCode 987. Vertical Order Traversal of a Binary Tree
    LeetCode 689. Maximum Sum of 3 Non-Overlapping Subarrays
    LeetCode 636. Exclusive Time of Functions
  • 原文地址:https://www.cnblogs.com/huhx/p/baseusepythonbeautifulsoup.html
Copyright © 2011-2022 走看看