zoukankan      html  css  js  c++  java
  • beautifulsoup简单用法

    原文地址

    http://www.cnblogs.com/yupeng/p/3362031.html

    这篇文章讲的也很全

    http://www.cnblogs.com/twinsclover/archive/2012/04/26/2471704.html

    稍微研究了下bs4这个库,运行了下都还好用,就是解析html的各种结构,和xml的elementTree解析库是类似的,使用起来差不多。

    可以直接调试,用来熟悉其用法

     1 # coding=utf-8
     2 #
     3 from bs4 import BeautifulSoup
     4 
     5 html_doc = """
     6 <html><head><title>The Dormouse's story</title></head>
     7 <body>
     8 <p class="title"><b>The Dormouse's story</b></p>
     9 <p class="story">Once upon a time there were three little sisters; and their names were
    10 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    11 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    12 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    13 and they lived at the bottom of a well.</p>
    14 <p class="story">...</p>
    15 """
    16 
    17 soup = BeautifulSoup(html_doc,'html.parser')
    18 # print soup.title
    19 # print soup.title.name
    20 # print soup.title.string
    21 # print soup.p
    22 # print soup.a
    23 # print soup.find_all('a')
    24 # a=soup.find_all('a')
    25 # print len(a)
    26 # print soup.find_all('p')#返回类似数组的结构
    27 # p=soup.find_all('p')
    28 # print len(p)
    29 # print soup.find(id='link3')
    30 
    31 # print soup.get_text()#返回整个的文本
    32 # print soup.p.get_text()#根据解析的节点来
    33 # for i in soup.find_all('p'):
    34     # print i.get_text()
    35     # print i.contents
    36 # print soup.a['href'],soup.a['class'],soup.a['id'],soup.a.text#注意单节点的每个内容都获取到了
    37 # print soup.html,soup.head,soup.body#s整体,头,身体,全部的结构
    38 # print soup.p.contents,soup.head.contents#列表形式返回子内容
    39 # for i in list(soup.head.children):#不需要知道子节点的名称,迭代遍历子内容
    40 #     print i,
    41 # print soup.a.parent#向上查找,parents是查找所有的
    42 # for i in soup.html.parents:
    43 #     print i,len(i)
    44 # print soup.a.parent
    45 # print soup.find_all(class_="sister")
    46 print soup.find_all('a',limit=1)#限制个数
  • 相关阅读:
    UVALive 7141 BombX
    CodeForces 722D Generating Sets
    CodeForces 722C Destroying Array
    CodeForces 721D Maxim and Array
    CodeForces 721C Journey
    CodeForces 415D Mashmokh and ACM
    CodeForces 718C Sasha and Array
    CodeForces 635C XOR Equation
    CodeForces 631D Messenger
    田忌赛马问题
  • 原文地址:https://www.cnblogs.com/dahu-daqing/p/6558812.html
Copyright © 2011-2022 走看看