zoukankan      html  css  js  c++  java
  • BeautifulSoup 安装使用

    Linux环境

    1. 安装

    方法一:

    下载:http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/

    解压:tar -xzvf beautifulsoup4-4.2.0.tar.gz

    安装:进入解压后的目录

    python setup.py build
    sudo python setup.py install

    方法二(快速安装)

    (Ubuntu) sudo apt-get install python-bs4
    或者
    install beautifulsoup4 或着 easy_install beautifulsoup4

    2. 引用(python环境下)

    from bs4 import BeautifulSoup

    3. 使用

    案例

    html_doc = """
    <html><head><title>The Dormouse's story</title></head>
    
    <p class="title"><b>The Dormouse's story</b></p>
    
    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
    
    <p class="story">...</p>
    """
    

    开始

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html_doc)
    >>> soup.head()
    [<title>The Dormouse's story</title>]
    >>> soup.title
    <title>The Dormouse's story</title>
    >>> soup.title.string
    u"The Dormouse's story"
    >>> soup.body.b
    <b>The Dormouse's story</b>
    >>> soup.body.b.string
    u"The Dormouse's story"
    >>> soup.a
    <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

    找到所有的a

    soup.find_all('a')
    [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

    打印每个a中的信息

    >>> for key in soup.find_all('a'):
    ...     print key.get('class'), key.get("href")
    ... 
    ['sister'] http://example.com/elsie
    ['sister'] http://example.com/lacie
    ['sister'] http://example.com/tillie

    参考

    http://www.crummy.com/software/BeautifulSoup/bs4/doc/

  • 相关阅读:
    为了我们自己的利益,请不要去支持番茄花园。
    游戏版本比较的算法[ZZ]
    DXUT框架剖析(9)
    强制删除任意文件以及文件夹
    安全幻想曲2008
    DXUT框架剖析(12)
    DXUT框架剖析(6)
    [Ph4nt0m] [zz]The Emergence Of A Theme
    俄国农民乘法
    写在msn签名上的I'M 计划
  • 原文地址:https://www.cnblogs.com/kaituorensheng/p/3722913.html
Copyright © 2011-2022 走看看