zoukankan      html  css  js  c++  java
  • BeautifulSoup

    from bs4 import BeautifulSoup
    import urllib2
    
    html = urllib2.urlopen('http://tieba.baidu.com/p/5058456989')
    bsobj = BeautifulSoup(html.read(), "html.parser")  # 不加"html.parser"会有警告。。。。
    print bsobj.title
    underline = '-'*100
    
    def get_title(url):
        try:
            html = urllib2.urlopen(url)
        except HTTPError, e:
            raise e
            return None
        try:
            bsobj = BeautifulSoup(html.read(), "html.parser")
            title = bsobj
        except AttributeError, e:
            raise e
            return None
        return title
    
    url = 'http://tieba.baidu.com/p/4420237089?see_lz=1'
    title = get_title(url)
    if title is None:
        print 'title is none'
    else:
        print underline
        # print title
    tmp = title.findAll("div", {"class": "d_post_content j_d_post_content "})
    vmp = title.findAll("span", {"class": "tail-info"})
    # for v in vmp.tr.next_siblings:
    #     print v
    for val, f in zip(tmp, vmp[1:-1:3]):
        print val.get_text()
        print f.get_text(), underline
  • 相关阅读:
    Webpack安装及基础配置
    相机拍到了光源的灯珠图像
    面向对象特殊用法
    面向对象初始
    内置函数和必须的模块
    模块基本模式
    函数三
    函数二
    装饰器
    函数初识
  • 原文地址:https://www.cnblogs.com/cmm2016/p/6709199.html
Copyright © 2011-2022 走看看