zoukankan      html  css  js  c++  java
  • MacOS下安装BeautifulSoup库及使用

    BeautifulSoup简介


    BeautifulSoup库是一个强大的python第三方库,它可以解析html进行解析,并提取信息。

    安装BeautifulSoup


    • 打开终端,输入命令:
    pip3 install beautifulsoup4
    

    BeautifulSoup库小测


    • 查看它的源代码:

    • 用request库获得源代码(存放在变量demo中):
    >>> import requests
    >>> r = requests.get("http://python123.io/ws/demo.html")
    >>> r.text
    '<html><head><title>This is a python demo page</title></head>
    <body>
    <p class="title"><b>The demo python introduces several python courses.</b></p>
    <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
    <a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
    </body></html>'
    >>> demo = r.text
    
    • 导入BeautifulSoup库
    >>> from bs4 import BeautifulSoup
    >>> 
    
    • 使用BeautifulSoup库解析html信息
    >>> demo = r.text
    >>> soup = BeautifulSoup(demo,'html.parser')
    >>> print(soup.prettify)
    <bound method Tag.prettify of <html><head><title>This is a python demo page</title></head>
    <body>
    <p class="title"><b>The demo python introduces several python courses.</b></p>
    <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
    <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
    </body></html>>
    >>> 
    

    如何使用BeautifulSoup库?

    • 代码框架:
    from bs4 import BeautifulSoup
    soup = BeautifulSoup('<p>data</p>','html.parser')
    
    • 其中BeautifulSoup的两个参数:
      • 第一个代表我们要解析的html格式的信息。
      • 第二个代表解析所使用到的解析器
  • 相关阅读:
    【BZOJ】4671: 异或图
    【LOJ】#2035. 「SDOI2016」征途
    【UOJ】#37. 【清华集训2014】主旋律
    【LOJ】#2320. 「清华集训 2017」生成树计数
    【LOJ】#2290. 「THUWC 2017」随机二分图
    【LOJ】#2291. 「THUSC 2016」补退选
    【LOJ】 #2545. 「JXOI2018」守卫
    【LOJ】#2292. 「THUSC 2016」成绩单
    【LOJ】#2562. 「SDOI2018」战略游戏
    《linux 内核全然剖析》sched.c sched.h 代码分析笔记
  • 原文地址:https://www.cnblogs.com/031602523liu/p/9824907.html
Copyright © 2011-2022 走看看