zoukankan      html  css  js  c++  java
  • 爬取糗事百科的热门段子,以及热图链接

    # -*- coding:utf-8 -*-
    import urllib
    import urllib2
    from bs4 import BeautifulSoup
    import re
    import os
    
    
    page = 1
    while page<10 :
    
        url = 'http://www.qiushibaike.com/hot/page/' + str(page)
        user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
        headers = { 'User-Agent' : user_agent }
        try:
            request = urllib2.Request(url,headers = headers)
            response = urllib2.urlopen(request)
    
            qiubai_html = response.read()
            #print qiubai_html
            soup = BeautifulSoup(qiubai_html,"html.parser")
            #print soup.find("a",class_="contentHerf")
            #print soup.find("a",class_="contenHerf").span.text
    
            file = open('imgsrc.txt','a')
    
            qiubailist = soup.find_all("a",class_="contentHerf")
            print 'this is page ',page
            for x in qiubailist:
                print x.span.text
                file.write(x.span.text.encode('utf-8')+'
    ')
                print '
    '
    
            imgSrclist = soup.find_all("div",class_="thumb")
            for x in imgSrclist:
                file.write(x.img['src'].encode('utf-8')+'
    ')
            file.close()
    
            print soup.find("div",class_="thumb").img['src']
    
            page = page + 1
        except urllib2.URLError, e:
            if hasattr(e,"code"):
                print e.code
                if hasattr(e,"reason"):
                    print e.reason 

     

  • 相关阅读:
    django之--模型层(ORM语法)
    mysql问题记录
    CentOS系统内存使用问题(内存是拿来用的,而不是拿来看的)
    CentOS6&CentOS7安装FFmpeg
    django之--模板层
    Django之--视图层
    Django之ORM学习2--路由层
    Django之ORM学习
    Django入门
    第二版mq 数据结构选型
  • 原文地址:https://www.cnblogs.com/lovely7/p/6119532.html
Copyright © 2011-2022 走看看