zoukankan      html  css  js  c++  java
  • beautifulsoup简单用法

    原文地址

    http://www.cnblogs.com/yupeng/p/3362031.html

    这篇文章讲的也很全

    http://www.cnblogs.com/twinsclover/archive/2012/04/26/2471704.html

    稍微研究了下bs4这个库,运行了下都还好用,就是解析html的各种结构,和xml的elementTree解析库是类似的,使用起来差不多。

    可以直接调试,用来熟悉其用法

     1 # coding=utf-8
     2 #
     3 from bs4 import BeautifulSoup
     4 
     5 html_doc = """
     6 <html><head><title>The Dormouse's story</title></head>
     7 <body>
     8 <p class="title"><b>The Dormouse's story</b></p>
     9 <p class="story">Once upon a time there were three little sisters; and their names were
    10 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    11 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    12 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    13 and they lived at the bottom of a well.</p>
    14 <p class="story">...</p>
    15 """
    16 
    17 soup = BeautifulSoup(html_doc,'html.parser')
    18 # print soup.title
    19 # print soup.title.name
    20 # print soup.title.string
    21 # print soup.p
    22 # print soup.a
    23 # print soup.find_all('a')
    24 # a=soup.find_all('a')
    25 # print len(a)
    26 # print soup.find_all('p')#返回类似数组的结构
    27 # p=soup.find_all('p')
    28 # print len(p)
    29 # print soup.find(id='link3')
    30 
    31 # print soup.get_text()#返回整个的文本
    32 # print soup.p.get_text()#根据解析的节点来
    33 # for i in soup.find_all('p'):
    34     # print i.get_text()
    35     # print i.contents
    36 # print soup.a['href'],soup.a['class'],soup.a['id'],soup.a.text#注意单节点的每个内容都获取到了
    37 # print soup.html,soup.head,soup.body#s整体,头,身体,全部的结构
    38 # print soup.p.contents,soup.head.contents#列表形式返回子内容
    39 # for i in list(soup.head.children):#不需要知道子节点的名称,迭代遍历子内容
    40 #     print i,
    41 # print soup.a.parent#向上查找,parents是查找所有的
    42 # for i in soup.html.parents:
    43 #     print i,len(i)
    44 # print soup.a.parent
    45 # print soup.find_all(class_="sister")
    46 print soup.find_all('a',limit=1)#限制个数
  • 相关阅读:
    什么样的人适合边打工边创业?
    手机市场分析
    《这个男人来自地球》台词
    关系网成网络盈利模式 LinkedIn网站探秘
    第二届手机应用大赛“金枝奖”评选
    乔布斯的平静让人不寒而栗
    发展移动互联网需理清商业模式
    好想看故乡夏夜的天空
    AdoHelper能否改写成单例模式?
    GMail邀请发放处
  • 原文地址:https://www.cnblogs.com/dahu-daqing/p/6558812.html
Copyright © 2011-2022 走看看