zoukankan      html  css  js  c++  java
  • BeautifulSoup学习笔记

    from BeautifulSoup import BeautifulSoup
    import re
    
    doc = ['<html><head><title>Page title</title></head>',
           '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
           '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
           '</html>']
    soup = BeautifulSoup(''.join(doc))
    print soup.prettify()
    

     运行结果为:

    print soup.contents[0].name
    #
    print soup.contents[0].contents[0].name
    
    for i in range(len(soup.contents[0])):
        print soup.contents[0].contents[i].name
    

     

    titleTag = soup.html.head.title
    titleTag
    # <title>Page title</title>
    
    titleTag.string
    # u'Page title'
    
    len(soup('p'))
    # 2
    
    soup.findAll('p', align="center")
    # [<p id="firstpara" align="center">This is paragraph <b>one</b>. </p>]
    
    soup.find('p', align="center")
    # <p id="firstpara" align="center">This is paragraph <b>one</b>. </p>
    
    soup('p', align="center")[0]['id']
    # u'firstpara'
    
    soup.find('p', align=re.compile('^b.*'))['id']
    # u'secondpara'
    
    soup.find('p').b.string
    # u'one'
    
    soup('p')[1].b.string
    # u'two'
    
  • 相关阅读:
    cookie和session会话技术
    hdoj 1176 免费馅饼
    nyoj 14 会场安排问题
    hdoj 1575 Tr A
    poj 1163 The Triangle
    hdoj 2157 How many ways??
    贪心Poj1328
    贪心 序列
    差分约束Poj 3169 Layout
    差分约束Poj3159 Candies
  • 原文地址:https://www.cnblogs.com/rollenholt/p/2271298.html
Copyright © 2011-2022 走看看