zoukankan      html  css  js  c++  java
  • Python中的并行编程速度

      这里主要想记录下今天碰到的一个小知识点:Python中的并行编程速率如何?

      我想把AutoTool做一个并行化改造,主要目的当然是想提高多任务的执行速度。第一反应就是想到用多线程执行不同模块任务,但是在我收集Python多线程编程资料的时候发现一个非常奇怪的信息,那就是Python的多线程并不是真正的多线程,因为有一个GIL的存在(可以参考这篇文章讲解《Python最难的问题》)导致Python实际上默认(CPython解释器)只能是单线程执行。

      这里我写了一个例子可以看看:

      1 #!/usr/bin/env python
      2 # -*- coding: utf-8 -*-
      3 # @File    : batch_swig_runner.py
      4 # @Time    : 2019/7/8 18:09
      5 # @Author  : KuLiuheng
      6 # @Email   : liuheng.klh@alibaba-inc.com
      7 
      8 from swig_runner import SwigRunner
      9 
     10 import time
     11 import logging
     12 from threading import Thread
     13 from multiprocessing import Pool
     14 
     15 
     16 class TestRunner(Thread):
     17     def __init__(self, name, path):
     18         super(TestRunner, self).__init__()
     19         self.name = name
     20         self.path = path
     21 
     22     def run(self):
     23         logging.warning("Message from the thread-%s START" % self.name)
     24         for i in range(10000000):   # 耗时操作模拟
     25             j = int(i) * 10.1
     26         # time.sleep(1)
     27         logging.warning("Message from the thread-%s END" % self.name)
     28         return self.path
     29 
     30 
     31 def multi_process(mname, mpath):
     32     logging.warning("Message from the thread-%s START" % mname)
     33     for i in range(10000000):   # 耗时操作模拟
     34         j = int(i) * 10.1
     35     # time.sleep(1)
     36     logging.warning("Message from the thread-%s END" % mname)
     37 
     38 
     39 class BatchSwigRunner(object):
     40     def __init__(self, modules=None):
     41         """
     42         用模块信息字典(工程名: 工程路径)来初始化
     43         :param modules: {工程名: 工程路径}
     44         """
     45         if modules is not None:
     46             self._modules = modules
     47         else:
     48             self._modules = dict()
     49 
     50     def add_module_info(self, name, path):
     51         self._modules[name] = path
     52 
     53     def start(self):
     54         """
     55         启动批量任务执行,并返回执行过程中的错误信息
     56         :return: list(工程序号,工程名称) 出错的工程信息列表
     57         """
     58         runners = list()
     59         for (project_name, project_path) in self._modules.items():
     60             # logging.warning('BatchSwigRunner.start() [%s][%s]' % (project_name, project_path))
     61             sub_runner = TestRunner(project_name, project_path)
     62             sub_runner.daemon = True
     63             sub_runner.start()
     64             runners.append(sub_runner)
     65 
     66         for runner in runners:
     67             runner.join()
     68 
     69 
     70 if __name__ == '__main__':
     71     batch_runner = BatchSwigRunner()
     72     batch_runner.add_module_info('name1', 'path1')
     73     batch_runner.add_module_info('name2', 'path2')
     74     batch_runner.add_module_info('name3', 'path3')
     75     batch_runner.add_module_info('name4', 'path4')
     76     start_time = time.time()
     77     batch_runner.start()
     78 
     79     print 'Total time comsumed = %.2fs' % (time.time() - start_time)
     80 
     81     print('========================================')
     82     start_time = time.time()
     83 
     84     for index in range(4):
     85         logging.warning("Message from the times-%d START" % index)
     86         for i in range(10000000):       # 耗时操作模拟
     87             j = int(i) * 10.1
     88         # time.sleep(1)
     89         logging.warning("Message from the times-%d END" % index)
     90 
     91     print '>>Total time comsumed = %.2fs' % (time.time() - start_time)
     92 
     93     print('----------------------------------------------')
     94     start_time = time.time()
     95 
     96     pool = Pool(processes=4)
     97     for i in range(4):
     98         pool.apply_async(multi_process, ('name++%d' % i, 'path++%d' % i))
     99     pool.close()
    100     pool.join()
    101     print '>>>> Total time comsumed = %.2fs' % (time.time() - start_time)
    View Code

       看结果就发现很神奇的结论:

    C:Python27python.exe E:/VirtualShare/gitLab/GBL-310/GBL/AutoJNI/autoTool/common/batch_swig_runner.py
    WARNING:root:Message from the thread-name4 START
    WARNING:root:Message from the thread-name2 START
    WARNING:root:Message from the thread-name3 START
    WARNING:root:Message from the thread-name1 START
    WARNING:root:Message from the thread-name2 END
    WARNING:root:Message from the thread-name4 END
    WARNING:root:Message from the thread-name3 END
    Total time comsumed = 15.92s
    ========================================
    WARNING:root:Message from the thread-name1 END
    WARNING:root:Message from the times-0 START
    WARNING:root:Message from the times-0 END
    WARNING:root:Message from the times-1 START
    WARNING:root:Message from the times-1 END
    WARNING:root:Message from the times-2 START
    WARNING:root:Message from the times-2 END
    WARNING:root:Message from the times-3 START
    WARNING:root:Message from the times-3 END
    >>Total time comsumed = 11.59s
    ----------------------------------------------
    WARNING:root:Message from the thread-name++0 START
    WARNING:root:Message from the thread-name++1 START
    WARNING:root:Message from the thread-name++2 START
    WARNING:root:Message from the thread-name++3 START
    WARNING:root:Message from the thread-name++1 END
    WARNING:root:Message from the thread-name++0 END
    WARNING:root:Message from the thread-name++2 END
    WARNING:root:Message from the thread-name++3 END
    >>>> Total time comsumed = 5.69s
    
    Process finished with exit code 0
    View Code

      其运行速度是(计算密集型):multiprocessing > normal > threading.Thread

      请注意这里用的是持续计算来模拟耗时操作:

    for i in range(10000000):   # 耗时操作模拟
        j = int(i) * 10.1

      如果用空等待(time.sleep(1)类似IO等待)来模拟耗时操作,那么结果就是(IO等待型):threading.Thread > multiprocessing > normal

    C:Python27python.exe E:/VirtualShare/gitLab/GBL-310/GBL/AutoJNI/autoTool/common/batch_swig_runner.py
    WARNING:root:Message from the thread-name4 START
    WARNING:root:Message from the thread-name2 START
    WARNING:root:Message from the thread-name3 START
    WARNING:root:Message from the thread-name1 START
    WARNING:root:Message from the thread-name3 END
    WARNING:root:Message from the thread-name4 END
    WARNING:root:Message from the thread-name2 END
    WARNING:root:Message from the thread-name1 END
    WARNING:root:Message from the times-0 START
    Total time comsumed = 1.01s
    ========================================
    WARNING:root:Message from the times-0 END
    WARNING:root:Message from the times-1 START
    WARNING:root:Message from the times-1 END
    WARNING:root:Message from the times-2 START
    WARNING:root:Message from the times-2 END
    WARNING:root:Message from the times-3 START
    WARNING:root:Message from the times-3 END
    >>Total time comsumed = 4.00s
    ----------------------------------------------
    WARNING:root:Message from the thread-name++0 START
    WARNING:root:Message from the thread-name++1 START
    WARNING:root:Message from the thread-name++2 START
    WARNING:root:Message from the thread-name++3 START
    WARNING:root:Message from the thread-name++0 END
    WARNING:root:Message from the thread-name++1 END
    WARNING:root:Message from the thread-name++2 END
    WARNING:root:Message from the thread-name++3 END
    >>>> Total time comsumed = 1.73s
    
    Process finished with exit code 0
    View Code

       为何会有这样的结果呢?

    (1)threading机制中因为GIL的存在,实际上是一把全局锁让多线程变成了CPU线性执行,只可能用到一颗CPU计算。当sleep这样是释放CPU操作发生时,可以迅速切换线程,切换速度可以接受(比multiprocessing快),比normal(阻塞等待)当然快的多;

    (2)这里用了多进程Pool,可以真正意义上使用多CPU,对于CPU计算密集型的操作(上面的for循环计算)那么肯定是多核比单核快。所以就出现了第一种测试场景的结果。

  • 相关阅读:
    字符输入输出
    每日一例
    每日一例
    结构
    指针数组的初始化
    装箱,拆箱,正则表达式
    数据类型的转换
    怎样让程序不断执行
    SQL练习1关于插入删除,修改,单表查询
    SQLSERVER 总结1
  • 原文地址:https://www.cnblogs.com/kuliuheng/p/11154481.html
Copyright © 2011-2022 走看看