zoukankan      html  css  js  c++  java
  • 爬取糗事百科的热门段子,以及热图链接

    # -*- coding:utf-8 -*-
    import urllib
    import urllib2
    from bs4 import BeautifulSoup
    import re
    import os
    
    
    page = 1
    while page<10 :
    
        url = 'http://www.qiushibaike.com/hot/page/' + str(page)
        user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
        headers = { 'User-Agent' : user_agent }
        try:
            request = urllib2.Request(url,headers = headers)
            response = urllib2.urlopen(request)
    
            qiubai_html = response.read()
            #print qiubai_html
            soup = BeautifulSoup(qiubai_html,"html.parser")
            #print soup.find("a",class_="contentHerf")
            #print soup.find("a",class_="contenHerf").span.text
    
            file = open('imgsrc.txt','a')
    
            qiubailist = soup.find_all("a",class_="contentHerf")
            print 'this is page ',page
            for x in qiubailist:
                print x.span.text
                file.write(x.span.text.encode('utf-8')+'
    ')
                print '
    '
    
            imgSrclist = soup.find_all("div",class_="thumb")
            for x in imgSrclist:
                file.write(x.img['src'].encode('utf-8')+'
    ')
            file.close()
    
            print soup.find("div",class_="thumb").img['src']
    
            page = page + 1
        except urllib2.URLError, e:
            if hasattr(e,"code"):
                print e.code
                if hasattr(e,"reason"):
                    print e.reason 

     

  • 相关阅读:
    端口以及服务常用cmd
    异步,同步,阻塞,非阻塞,并行,并发,
    mysql启动不起来
    安装nagios出现的错误
    Linux内核优化
    mysql使用常见问题
    mysql日志
    mysql数据库使用脚本实现分库备份过程
    mysqladmin常用用法
    mysql授权
  • 原文地址:https://www.cnblogs.com/lovely7/p/6119532.html
Copyright © 2011-2022 走看看