zoukankan      html  css  js  c++  java
  • Python多进程multiprocessing.Pool()

    1、multiprocessing.pool函数

    class multiprocessing.pool.Pool([processes[, initializer[, initargs[, maxtasksperchild[, context]]]]])
    用途:A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.
    参数介绍:
    processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.

    If initializer is not None then each worker process will call initializer(*initargs) when it starts.

    maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

    context can be used to specify the context used for starting the worker processes. Usually a pool is created using the function multiprocessing.Pool() or the Pool() method of a context object. In both cases context is set appropriately.

    Note that the methods of the pool object should only be called by the process which created the pool.
    关于Pool()的相关翻译参见:http://www.cnblogs.com/congbo/archive/2012/08/23/2652490.html

    关于multiprocess:
    multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
    上面说的尤其注意这里产生的是多进程,而不是多线程,所以pool函数里面的第一个参数如果大于CPU的核心数可能反而导致效率更低,可以实测一下!!!

    for more information about multiprocessing,please check the Python API

    2、实例和介绍

    主要介绍map函数的使用,一手包办了序列操作、参数传递和结果保存等一系列的操作。
    首先是引入库:

    from multiprocessing.dummy import Pool 
    pool=Pool(4) 
    results=pool.map(爬取函数,网址列表)

    本文将一个简单的例子来看一下如何使用map函数以及这种方法与普通方法的对比情况。

    import time
    from multiprocessing.dummy import Pool
    
    def getsource(url):
        html=requests.get(url)
    
    urls=[]
    for i in range(1,21):
        newpage='http://tieba.baidu.com/p/3522395718?pn='+str(i)
        urls.append(newpage)
    
    timex=time.time()  #测试一
    for i in urls:
        getsource(i)
    print (time.time()-timex)
    
    #这里是输出的结果:
    #10.2820000648 
    
    
    time1=time.time()  #测试二
    pool=Pool(4)
    results=pool.map(getsource,urls)
    pool.close()
    pool.join()
    print (time.time()-time1)
    
    #这里是输出结果:
    #3.23600006104

    对比以上两种方法,可以很明显地看出 测试二比测试一要快很多。

    对程序做一下解释:
    测试一种
    for i in urls:
    getsource(i) #使程序一直遍历urls列表中的网址,然后循环调用getsource函数

    测试二中:
    pool=Pool(4) #声明了4个线程数量,这里的个数根据你电脑的CPU个数来定。
    results=pool.map(getsource,urls) #这里使用map函数,并且函数的参数为自定义函数名称,以及函数中的参数(这里为一个列表)
    pool.close() #关闭pool对象
    pool.join() #join函数的主要作用是等待所有的线程(4个)都执行结束后
    print (time.time()-time1) #输出所用时间差

    列举Pool的其他应用函数:

    from multiprocessing import Pool
    
    def f(x): #定义一个自定义函数f
        return x*x
    
    if __name__ == '__main__':
        pool = Pool(processes=4)              # start 4 worker processes
    
        result = pool.apply_async(f, (10,))    # 评估"f(10)" asynchronously
        print result.get(timeout=1)           #限定反应时间为1 通过get函数取得result的结果
    
        print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"
    
        it = pool.imap(f, range(10)) #使用imap函数执行自定义函数
        print it.next()                       # prints "0" 使用next函数一个一个地取得it的执行结果
        print it.next()                       # prints "1"
        print it.next(timeout=1)              # prints "4" unless your computer is *very* slow
    
        import time
        result = pool.apply_async(time.sleep, (10,))
        print result.get(timeout=1)           # raises TimeoutError

    实例参考:http://blog.csdn.net/winterto1990/article/details/47976105

  • 相关阅读:
    使用python发送(SMTP)qq邮件
    google hack
    python多线程爬取网页
    windows自带的颜色编辑器居中
    (转)如何在任务栏添加托盘图标
    c++ 字符串转数字或数字转字符串
    (转)null和NULL和nullptr和””区别
    Windows系统自带选择文件的对话重写和居中处理
    ANSII 与Unicode,Utf8之间的转换
    (转) Windows如何区分鼠标双击和两次单击
  • 原文地址:https://www.cnblogs.com/zswbky/p/8454105.html
Copyright © 2011-2022 走看看