zoukankan      html  css  js  c++  java
  • BeautifulSoup学习笔记

    from BeautifulSoup import BeautifulSoup
    import re
    
    doc = ['<html><head><title>Page title</title></head>',
           '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
           '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
           '</html>']
    soup = BeautifulSoup(''.join(doc))
    print soup.prettify()
    

     运行结果为:

    print soup.contents[0].name
    #
    print soup.contents[0].contents[0].name
    
    for i in range(len(soup.contents[0])):
        print soup.contents[0].contents[i].name
    

     

    titleTag = soup.html.head.title
    titleTag
    # <title>Page title</title>
    
    titleTag.string
    # u'Page title'
    
    len(soup('p'))
    # 2
    
    soup.findAll('p', align="center")
    # [<p id="firstpara" align="center">This is paragraph <b>one</b>. </p>]
    
    soup.find('p', align="center")
    # <p id="firstpara" align="center">This is paragraph <b>one</b>. </p>
    
    soup('p', align="center")[0]['id']
    # u'firstpara'
    
    soup.find('p', align=re.compile('^b.*'))['id']
    # u'secondpara'
    
    soup.find('p').b.string
    # u'one'
    
    soup('p')[1].b.string
    # u'two'
    
  • 相关阅读:
    怎么把共享文件夹显示在我的电脑
    window时间同步机制的简单介绍
    向指定服务器的指定端口发送UDP包
    窜口通信-读取时间码
    窜口通信-发送时间码
    回环网卡通信
    简单的TCP接受在转发到客户端的套接口
    国内能用的NTP服务器及和标准源的偏差值
    简单的UDP接受程序
    TCP包服务器接受程序
  • 原文地址:https://www.cnblogs.com/rollenholt/p/2271298.html
Copyright © 2011-2022 走看看