zoukankan      html  css  js  c++  java
  • Python 一个抓取糗百的段子的小程序

    import requests
    import re
    #糗事百科爬虫类
    class QSBK:
        #初始化方法,定义一些变量
        def __init__(self):
            self.headers={
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36"
            }
            #存放段子的变量,每一个元素是每一页的段子们
            self.stories=[]
            #存放程序是否继续运行的变量
            self.enable =False
    
        def getPage(self,page):
            try:
                url = 'http://www.qiushibaike.com/hot/page/' + str(page)
                print(url)
                response = requests.get(url,headers=self.headers)
                html_content = response.content.decode('UTF-8')
                #print(html_content)
                # regex=re.compile('<div class="article block untagged mb15sS+" id="S+">.*?</div>')
                regex = re.compile('<h2>(.*?)</h2>.*?<div class="content">W+<span>(.*?)</span>', re.S)
                #regex = re.compile('<h2>', re.S)
                regex_content = re.findall(regex, html_content)
                print(regex_content)
                for i in regex_content:
                    self.stories.append(i[0].replace('
    ', ''),i[1].replace('
    ', ''))
                return self.stories
            except Exception as e:
                print('异常:%s' % e)
    
    
    
    js = QSBK()
    
    for i in range(100):
        lists = js.getPage(i)
    
        print('============================================ 第 '+str(i)+' 页 =============================================')
        print(lists)

    如果发现不能用,那就是糗百又改规则了

  • 相关阅读:
    vuejs组件交互
    markdown table语法
    vue循环中的v-show
    apache跨域
    sublime text执行PHP代码
    PHP语法
    方法(method)和函数(function)有什么区别?
    PHP MVC单入口
    phpstudy部署thinkPHP
    MACD判断定背离,底背离
  • 原文地址:https://www.cnblogs.com/youmingkuang/p/7569146.html
Copyright © 2011-2022 走看看