最近需要使用 python3 多线程处理大型数据,顺道探究了一下,python3 的线程模型的情况,下面进行简要记录;
多线程运行的优点:
- 使用线程可以把程序中占用时间较长的任务放到后台去处理;
- 用户界面可以更加吸引人,并且不阻塞界面的运行;
- 程序运行的速度可以更快;
- 充分利用CPU多核的特征进行处理;
内核线程:由操作系统内核创建和撤销;
用户线程:不需要内核支持在用户程序中实现的线程;
Python3 中的多线程:
- _thread 提供了一些原始的api 用于写多线程程序;
- threading 提供了更加便利的接口
- 两者都是python3内置的线程模块

#!/usr/bin/env python import _thread def print_time( threadName, delay): print (threadName) count = 0 while 1: pass count += 1 try: _thread.start_new_thread( print_time, ("Thread-1", 1, ) ) _thread.start_new_thread( print_time, ("Thread-2", 2, ) ) _thread.start_new_thread( print_time, ("Thread-3", 2, ) ) _thread.start_new_thread( print_time, ("Thread-4", 2, ) ) _thread.start_new_thread( print_time, ("Thread-5", 2, ) ) _thread.start_new_thread( print_time, ("Thread-6", 2, ) ) _thread.start_new_thread( print_time, ("Thread-7", 2, ) ) _thread.start_new_thread( print_time, ("Thread-8", 2, ) ) _thread.start_new_thread( print_time, ("Thread-9", 2, ) ) _thread.start_new_thread( print_time, ("Thread-10", 2, ) ) _thread.start_new_thread( print_time, ("Thread-11", 2, ) ) _thread.start_new_thread( print_time, ("Thread-12", 2, ) ) _thread.start_new_thread( print_time, ("Thread-13", 2, ) ) _thread.start_new_thread( print_time, ("Thread-14", 2, ) ) _thread.start_new_thread( print_time, ("Thread-15", 2, ) ) except: print ("Error: can't start thread!") while 1: pass

#!/usr/bin/env python3 import threading import time exitFlag = 0 class myThread (threading.Thread): def __init__(self, threadID, name, counter): threading.Thread.__init__(self) self.threadID = threadID self.name = name self.counter = counter def run(self): print ("start" + self.name) print_time(self.name, self.counter, 5) print ("exit" + self.name) def print_time(threadName, delay, counter): while counter: if exitFlag: threadName.exit() time.sleep(delay) print ("%s: %s" % (threadName, time.ctime(time.time()))) counter -= 1 thread1 = myThread(1, "Thread-1", 1) thread2 = myThread(2, "Thread-2", 2) thread1.start() thread2.start() thread1.join() thread2.join() print ("exit!")
python 的多线程 threading 有时候并不是特别理想. 最主要的原因是就是, Python 的设计上, 有一个必要的环节, 就是 Global Interpreter Lock (GIL). 这个东西让 Python 还是一次性只能处理一个东西:
尽管Python完全支持多线程编程, 但是解释器的C语言实现部分在完全并行执行时并不是线程安全的。 实际上,解释器被一个全局解释器锁保护着,它确保任何时候都只有一个Python线程执行。 GIL最大的问题就是Python的多线程程序并不能利用多核CPU的优势 (比如一个使用了多个线程的计算密集型程序只会在一个单CPU上面运行); 如果要进行利用python的多进程形式,可以使用python的 multiprocessing 编程模型包;
GIL只会影响到那些严重依赖CPU的程序(比如计算型的)。 如果你的程序大部分只会涉及到I/O,比如网络交互,那么使用多线程就很合适, 因为它们大部分时间都在等待;

import threading from queue import Queue import copy import time def job(l, q): res = sum(l) q.put(res) def multithreading(l): q = Queue() threads = [] for i in range(4): t = threading.Thread(target=job, args=(copy.copy(l), q), name='T%i' % i) t.start() threads.append(t) [t.join() for t in threads] total = 0 for _ in range(4): total += q.get() print(total) def normal(l): total = sum(l) print(total) if __name__ == '__main__': l = list(range(1000000)) s_t = time.time() normal(l*4) print('normal: ',time.time()-s_t) s_t = time.time() multithreading(l) print('multithreading: ', time.time()-s_t)

#!/usr/bin/env python import multiprocessing as mp import threading as td def job(a,b): while 1: pass t1 = td.Thread(target=job,args=(1,2)) t2 = td.Thread(target=job,args=(1,2)) t3 = td.Thread(target=job,args=(1,2)) t4 = td.Thread(target=job,args=(1,2)) t5 = td.Thread(target=job,args=(1,2)) t6 = td.Thread(target=job,args=(1,2)) t7 = td.Thread(target=job,args=(1,2)) t8 = td.Thread(target=job,args=(1,2)) t9 = td.Thread(target=job,args=(1,2)) t10 = td.Thread(target=job,args=(1,2)) t11 = td.Thread(target=job,args=(1,2)) t12 = td.Thread(target=job,args=(1,2)) t13 = td.Thread(target=job,args=(1,2)) t14 = td.Thread(target=job,args=(1,2)) t15 = td.Thread(target=job,args=(1,2)) t16 = td.Thread(target=job,args=(1,2)) # p1 = mp.Process(target=job,args=(1,2)) # p2 = mp.Process(target=job,args=(1,2)) # p3 = mp.Process(target=job,args=(1,2)) # p4 = mp.Process(target=job,args=(1,2)) # p5 = mp.Process(target=job,args=(1,2)) # p6 = mp.Process(target=job,args=(1,2)) # p7 = mp.Process(target=job,args=(1,2)) # p8 = mp.Process(target=job,args=(1,2)) # p9 = mp.Process(target=job,args=(1,2)) # p10 = mp.Process(target=job,args=(1,2)) # p11 = mp.Process(target=job,args=(1,2)) # p12 = mp.Process(target=job,args=(1,2)) # p13 = mp.Process(target=job,args=(1,2)) # p14 = mp.Process(target=job,args=(1,2)) # p15 = mp.Process(target=job,args=(1,2)) # p16 = mp.Process(target=job,args=(1,2)) t1.start() t2.start() t3.start() t4.start() t5.start() t6.start() t7.start() t8.start() t9.start() t10.start() t11.start() t12.start() t13.start() t14.start() t15.start() t16.start() # p1.start() # p2.start() # p3.start() # p4.start() # p5.start() # p6.start() # p7.start() # p8.start() # p9.start() # p10.start() # p11.start() # p12.start() # p13.start() # p14.start() # p15.start() # p16.start() t1.join() t2.join() t3.join() t4.join() t5.join() t6.join() t7.join() t8.join() t9.join() t10.join() t11.join() t12.join() t13.join() t14.join() t15.join() t16.join() # p1.join() # p2.join() # p3.join() # p4.join() # p5.join() # p6.join() # p7.join() # p8.join() # p9.join() # p10.join() # p11.join() # p12.join() # p13.join() # p14.join() # p15.join() # p16.join()
使用python multiprocess 包能够发挥多核CPU并行处理能力:
- multiprocess 接口和threading 使用的接口一样;
并发控制:
- 进程锁: mp.Lock(), mp.acquire(), mp.release()
- 线程锁: td.Lock(), td.acquire(), td.release()
- python 为了提高可用性,保证了multiprocessing 和 threading 中,大多数接口使用都是相同的,较为方便;
- 多cpu之间,通过共享内存交流;mp.Value('i', 0)
- 输出队列:mp.Queue() 而 线程之间可以共享内存,可以直接使用 from queue import Queue 来进行引入队列进行使用;
保持更新,转载请注明出处,更多内容请关注cnblogs.com/xuyaowen;
参考链接:
https://morvanzhou.github.io/tutorials/python-basic/threading/5-GIL/
https://python3-cookbook.readthedocs.io/zh_CN/latest/c12/p09_dealing_with_gil_stop_worring_about_it.html (Python Cookbook 3rd Edition Documentation)
https://morvanzhou.github.io/tutorials/python-basic/multiprocessing/2-add/