zoukankan      html  css  js  c++  java
  • 爬虫神器xpath的用法(三)

    xpath的多线程爬虫

    #encoding=utf-8
    '''
    pool = Pool(4) cpu的核数为4核
    results = pool.map(爬取函数,网址列表)
    '''
    from multiprocessing.dummy import Pool as ThreadPool
    import requests
    import time
    
    def getsource(url):
        html = requests.get(url)
    
    urls = []
    
    for i in range(1,21):
        newpage = 'http://tieba.baidu.com/p/3522395718?pn=' + str(i)
        urls.append(newpage)
    
    time1 = time.time()
    for i in urls:
        print i
        getsource(i)
    time2 = time.time()
    print u'单线程耗时:' + str(time2-time1)
    
    pool = ThreadPool(4)
    time3 = time.time()
    results = pool.map(getsource, urls)
    pool.close()
    pool.join()
    time4 = time.time()
    print u'并行耗时:' + str(time4-time3)

    输出:

    单线程耗时:12.0818030834
    并行耗时:3.58480286598

  • 相关阅读:
    1.2 流程控制
    SpringMVC-第一个MVC程序的搭建需要
    用户与权限
    自定义函数和存储过程
    触发器
    事务
    约束
    视图和索引
    函数二
    函数一
  • 原文地址:https://www.cnblogs.com/gide/p/5246809.html
Copyright © 2011-2022 走看看