zoukankan      html  css  js  c++  java
  • python之BeautifulSoup模块

    # 名称修改(bs4)
    from bs4 import BeautifulSoup

     帮助文档

    Beautiful Soup parses a (possibly invalid) XML or HTML document into a
    tree representation. It provides methods and Pythonic idioms that make
    it easy to navigate, search, and modify the tree.

    A well-formed XML/HTML document yields a well-formed data
    structure. An ill-formed XML/HTML document yields a correspondingly
    ill-formed data structure. If your document is only locally
    well-formed, you can use this library to find and process the
    well-formed part of it.

    Beautiful Soup works with Python 2.2 and up. It has no external
    dependencies, but you'll have more success at converting data to UTF-8
    if you also install these three packages:

    * chardet, for auto-detecting character encodings
      http://chardet.feedparser.org/
    * cjkcodecs and iconv_codec, which add more encodings to the ones supported
      by stock Python.
      http://cjkpython.i18n.org/

    Beautiful Soup defines classes for two main parsing strategies:

     * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific
       language that kind of looks like XML.

     * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid
       or invalid. This class has web browser-like heuristics for
       obtaining a sensible parse tree in the face of common HTML errors.

    Beautiful Soup also defines a class (UnicodeDammit) for autodetecting
    the encoding of an HTML or XML document, and converting it to
    Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed Parser.

    For more than you ever wanted to know about Beautiful Soup, see the
    documentation:
    http://www.crummy.com/software/BeautifulSoup/documentation.html

    Here, have some legalese:

    Copyright (c) 2004-2010, Leonard Richardson

    All rights reserved.

  • 相关阅读:
    Spark架构分析
    mr运行出错,解决办法
    hbase调优
    虚拟机长时间不关造成的问题
    crontab 使用
    虚拟机克隆网络问题的解决
    ligerui.grid.extend.rowSpan
    64位下安装Scrapy 报错 "could not find openssl.exe" 的解决方法。
    EventBus 事件总线之我的理解
    MongoDB 系列(二) C# 内嵌元素操作 聚合使用
  • 原文地址:https://www.cnblogs.com/jinhh/p/8032286.html
Copyright © 2011-2022 走看看