zoukankan      html  css  js  c++  java
  • 协程-爬虫示例

    from gevent import monkey;monkey.patch_all()#打补丁,使gevent识别I/O阻塞进而实现协程
    import requests,re,gevent,time
    
    
    def get_info(url):#爬网页函数
        res = requests.get(url)
        print(len(res.text))
        return res.text
    
    
    def prase(res):#解析网页数据函数
        res_name = re.findall(r'title="(?P<name>S+s?S*?)"', res)
        move_name = []
        for i in res_name[::2]:
            move_name.append(i.split('"')[0])
        move_actor = re.findall(r'主演:S+', res)#(?P<name>主演:S+)/?
    {0,1}
        for i in range(len(move_name)):
            with open('movie_info.txt','a') as f:
                f.write('电影名:%s , %s'%(move_name[i],move_actor[i].split('<')[0]))
                f.write('
    ')
    
    urls = [
        'http://maoyan.com/board/7',
        'http://maoyan.com/board/6',
        'http://maoyan.com/board/1',
        'http://maoyan.com/board/2',
        'http://maoyan.com/board/4',
    ]
    
    if __name__ == '__main__':
        start = time.time()
        # g_l = []
        for url in urls:
            # print(url)
            g = gevent.spawn(prase,get_info(url))
            # g_l.append(g)
        # gevent.joinall(g_l)
        g.join()#之所以只加一个join而不用joinall是因为主进程会等get_info(url)作为参数执行完了,主进程不会等prase执行完所以让主进程等最后一个prase即可
        print('解析结束',time.time()-start)
    没啥好解释的直接看脚本

    协程确实运行很快,轻量化,节约cpu与内存使用,可以实现高并发量(伪)

  • 相关阅读:
    P4890 Never·island
    P2617 Dynamic Rankings
    P3243 [HNOI2015]菜肴制作
    P4172 [WC2006]水管局长
    P4219 [BJOI2014]大融合
    P5241 序列
    P1501 [国家集训队]Tree II
    无法读取用户配置文件,系统自动建立Temp临时用户
    组件服务 控制台打不开
    打印服务器 功能地址保护错误
  • 原文地址:https://www.cnblogs.com/fenglin0826/p/7458689.html
Copyright © 2011-2022 走看看