在python多进程编程过程中出现如下问题:
from multiprocessing import Process,Queue,set_start_method,get_context def download_from_web(q): data = [11,22,33,44] for temp in data: q.put(temp) def analysis_data(q): """处理数据""" watting_analysis_data = list() while True: data = q.get() watting_analysis_data.append(data) if q.empty(): break print(watting_analysis_data) def main(): q = Queue() p1 = Process(target=download_from_web,args=(q,)) p2 = Process(target=analysis_data,args=(q,)) p1.start() p2.start() if __name__ == "__main__": main()
运行会报错:
Traceback (most recent call last): File "<string>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__ self._semlock = _multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory Traceback (most recent call last): File "<string>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__ self._semlock = _multiprocessing.SemLock._rebuild(*state) FileNotFoundError: [Errno 2] No such file or directory
实际上,Python 创建的子进程执行的内容,和启动该进程的方式有关。而根据不同的平台,启动进程的方式大致可分为以下 3 种:
- spawn:使用此方式启动的进程,只会执行和 target 参数或者 run() 方法相关的代码。Windows 平台只能使用此方法,事实上该平台默认使用的也是该启动方式。相比其他两种方式,此方式启动进程的效率最低。
- fork:使用此方式启动的进程,基本等同于主进程(即主进程拥有的资源,该子进程全都有)。因此,该子进程会从创建位置起,和主进程一样执行程序中的代码。注意,此启动方式仅适用于 UNIX 平台,os.fork() 创建的进程就是采用此方式启动的。
- forserver:使用此方式,程序将会启动一个服务器进程。即当程序每次请求启动新进程时,父进程都会连接到该服务器进程,请求由服务器进程来创建新进程。通过这种方式启动的进程不需要从父进程继承资源。注意,此启动方式只在 UNIX 平台上有效。
原因是MAC电脑默认启动进程的方式是fork,而python默认的方式是spawn,所以需要将python启动进程的方式做修改:
from multiprocessing import Process,Queue,set_start_method,get_context def download_from_web(q): data = [11,22,33,44] for temp in data: q.put(temp) def analysis_data(q): """处理数据""" watting_analysis_data = list() while True: data = q.get() watting_analysis_data.append(data) if q.empty(): break print(watting_analysis_data) def main():
set_start_method('fork') q = Queue() p1 = Process(target=download_from_web,args=(q,)) p2 = Process(target=analysis_data,args=(q,)) p1.start() p2.start() if __name__ == "__main__": main()
也可以使用 get_context() 来获取上下文对象。上下文对象与多处理模块具有相同的API,并允许在同一程序中使用多个启动方法,如下:
from multiprocessing import Process,Queue,set_start_method,get_context def download_from_web(q): data = [11,22,33,44] for temp in data: q.put(temp) def analysis_data(q): """处理数据""" watting_analysis_data = list() while True: data = q.get() watting_analysis_data.append(data) if q.empty(): break print(watting_analysis_data) def main(): q = Queue() ctx = get_context('fork') p1 = ctx.Process(target=download_from_web,args=(q,)) p2 = ctx.Process(target=analysis_data,args=(q,)) p1.start() p2.start() if __name__ == "__main__": main()
这样就成功获取到我们的结果了:
[11, 22, 33, 44]
-------------------------------------------------------------------------------------------------------------------------------------------
另外如下代码也可以正常得到结果:
import multiprocess def download_from_web(q): data = [11,22,33,44] for temp in data: q.put(temp) def analysis_data(q): """处理数据""" watting_analysis_data = list() while True: data = q.get() watting_analysis_data.append(data) if q.empty(): break print(watting_analysis_data) def main(): q = multiprocess.Queue() p1 = multiprocess.Process(target=download_from_web,args=(q,)) p2 = multiprocess.Process(target=analysis_data,args=(q,)) p1.start() p2.start() if __name__ == "__main__": main()