zoukankan      html  css  js  c++  java
  • Beautiful Soup 库基础知识

    1.安装

    cmd------->>pip install beautifulsoup4

    2.安装测试。

    import requests                        # 导入requests库
    from bs4 import BeautifulSoup          # 导入美味汤库
    
    r = requests.get("http://python123.io/ws/demo.html")
    print(r.status_code)                  # 测试是否连接正常
    # print(r.text)                       # 全部文本信息
    
    demo = r.text                              # 赋值,方便后期处理
    
    soup = BeautifulSoup(demo,"html.parser")   # 开始煲汤   demo 为解析对象。   html.parser 为解析方式
    
    print(soup.prettify())                     # 友好显示结果

    煲汤过程可以总结为:

    from bs4 import BeautifulSoup                         # B and S 大写 
    soup = BeautifulSoup ("<p>date</p>","html.parser")    # <p>date</p> 解析对象   "html.parser" 解析器 

    3. BeautifulSoup 的基本元素。

    import requests                     #
    r = requests.get("http://python123.io/ws/demo.html")
    demo = r.text
    
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(demo,"html.parser")
    
    # tag
    print(soup.a)
    print(soup.p)
    print(soup.a.prettify())   # 标签内容的友好显示
    print(soup.p.prettify())   # 标签内容的友好显示
    
    # name
    print(soup.a.name)
    print(soup.p.name)
    
    # string
    print(soup.a.string)
    print(soup.p.string)
    
    # 属性
    print(soup.a.attrs)
    print(soup.p.attrs)

    3.1 Tag标签

    import requests                    from bs4 import BeautifulSoup
    r = requests.get("http://python123.io/ws/demo.html")
    demo = r.text 
    
    soup = BeautifulSoup(demo,"html.parser")
    tag = soup.a              
    print(tag)      #a tag               # <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a>
    # 标签。最基本的信息组织单元,别用 <> 和 </> 表明开头和结尾。

    3.2 标签的名字

    3.3 标签的属性 (不懂有什么用)

    3.4 标签的字符串

    print(soup.a.string)                                              # Basic Python
    print(soup.p.string)                                              # The demo python introduces several python courses.
    print(type(soup.p.string))                                        # <class 'bs4.element.NavigableString'>

    3.5  注释。

    demo,"html.parser"
  • 相关阅读:
    Unix命令大全
    vs2008 与 IE8出现的兼容性问题
    Java 创建文件、文件夹以及临时文件
    如何修改Wamp中mysql默认空密码
    PAT 乙级真题 1003.数素数
    Tags support in htmlText flash as3
    DelphiXE4 FireMonkey 试玩记录,开发IOS应用 还是移植
    10 Great iphone App Review sites to Promote your Apps!
    HTML tags in textfield
    Delphi XE4 IOS 开发, "No eligible applications were found“
  • 原文地址:https://www.cnblogs.com/hanbb/p/7223276.html
Copyright © 2011-2022 走看看