zoukankan      html  css  js  c++  java
  • MacOS下安装BeautifulSoup库及使用

    BeautifulSoup简介


    BeautifulSoup库是一个强大的python第三方库,它可以解析html进行解析,并提取信息。

    安装BeautifulSoup


    • 打开终端,输入命令:
    pip3 install beautifulsoup4
    

    BeautifulSoup库小测


    • 查看它的源代码:

    • 用request库获得源代码(存放在变量demo中):
    >>> import requests
    >>> r = requests.get("http://python123.io/ws/demo.html")
    >>> r.text
    '<html><head><title>This is a python demo page</title></head>
    <body>
    <p class="title"><b>The demo python introduces several python courses.</b></p>
    <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
    <a href="http://www.icourse163.org/course/BIT-268001" class="py1" id="link1">Basic Python</a> and <a href="http://www.icourse163.org/course/BIT-1001870001" class="py2" id="link2">Advanced Python</a>.</p>
    </body></html>'
    >>> demo = r.text
    
    • 导入BeautifulSoup库
    >>> from bs4 import BeautifulSoup
    >>> 
    
    • 使用BeautifulSoup库解析html信息
    >>> demo = r.text
    >>> soup = BeautifulSoup(demo,'html.parser')
    >>> print(soup.prettify)
    <bound method Tag.prettify of <html><head><title>This is a python demo page</title></head>
    <body>
    <p class="title"><b>The demo python introduces several python courses.</b></p>
    <p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
    <a class="py1" href="http://www.icourse163.org/course/BIT-268001" id="link1">Basic Python</a> and <a class="py2" href="http://www.icourse163.org/course/BIT-1001870001" id="link2">Advanced Python</a>.</p>
    </body></html>>
    >>> 
    

    如何使用BeautifulSoup库?

    • 代码框架:
    from bs4 import BeautifulSoup
    soup = BeautifulSoup('<p>data</p>','html.parser')
    
    • 其中BeautifulSoup的两个参数:
      • 第一个代表我们要解析的html格式的信息。
      • 第二个代表解析所使用到的解析器
  • 相关阅读:
    网络芯片应用:GPS公交车行驶记录仪
    “黑暗潜伏者” -- 手机病毒新型攻击方式
    Android导入工程提示Invalid project description
    设计模式_代理模式
    el简略说明与11个隐含对象
    jsp九大内置对象
    js全局函数
    python基础:multiprocessing的使用
    pgAdmin III 使用图解
    windows下简单安装postgres
  • 原文地址:https://www.cnblogs.com/031602523liu/p/9824907.html
Copyright © 2011-2022 走看看