python中的多线程无法利用多核优势,如果想要充分地使用多核CPU的资源(os.cpu_count()查看),在python中大部分情况需要使用多进程。
Python提供了multiprocessing。multiprocessing模块用来开启子进程,并在子进程中执行我们定制的任务(比如函数),该模块与多线程模块threading的编程接口类似。
简单的进程程序:
import multiprocessing #引入模块 def task(arg): print(arg) def run(): for i in range(10):#循环创建十个进程
p=multiprocessing.Process(target=task,args=(i,)) p.start() #准备好执行进程 if __name__=="__main__": run()
常用功能:
join():括号内有参数时,指定等待子进程的时间,时间到了以后继续向下执行,无参数时,等待子进程执行完毕以后继续向下执行.
daemon():括号内默认值是False,手动改成True后,优先执行主进程,执行完不等待子进程是否已经执行完.
name():创建进程名称 name=multiprocessing.current_process()#获取线程名字
创建进程(两种方式):
1 类继承方法创建:
import multiprocessing class MyProcess(multiprocessing.Process): def run(self): print("当前进程是:",multiprocessing.current_process()) def run(): p1=MyProcess()#进程一 p1.start() #自动执行类里面的run方法 p2=MyProcess() p2.start()#进程二 if __name__=="__main__": run()
2普通方法
import multiprocessing def task(): print("当前进程是:",multiprocessing.current_process()) def run(): for i in range(2): p=multiprocessing.Process(target=task,) p.start() if __name__=="__main__": run()
数据共享:
1 Queue:
import multiprocessing def task(arg,q): q.put(arg) if __name__=="__main__": q = multiprocessing.Queue() for i in range(10): p = multiprocessing.Process(target=task,args=(i,q,)) p.start() while True: v = q.get() print(v)
import mulprocessing q = multiprocessing.Queue() def task(arg,q): q.put(arg) def run(): for i in range(10): p = multiprocessing.Process(target=task, args=(i, q,)) p.start() while True: v = q.get() print(v) run()
2 Manger:
import multiprocessing import time def func(arg,dic): time.sleep(2) dic[arg] = 100 if __name__ == "__main__": m = multiprocessing.Manager() dic = m.dict() process_list = [] for i in range(10): p = multiprocessing.Process(target=func, args=(i, dic,)) p.start() process_list.append(p) while True: count=0 for p in process_list: if not p.is_alive(): count+=1 if count==len(process_list): break print(dic)
进程锁:与线程用法一致.
import time import multiprocessing lock = multiprocessing.RLock() def task(arg): print('鬼子来了') lock.acquire() time.sleep(2) print(arg) lock.release() if __name__ == '__main__': p1 = multiprocessing.Process(target=task,args=(1,)) p1.start() p2 = multiprocessing.Process(target=task, args=(2,)) p2.start()
进程池:限制进程最多创建的数
import multiprocessing from concurrent.futures import ProcessPoolExecutor def task(): print("当前进程是:",multiprocessing.current_process()) time.sleep(1) if __name__=="__main__": pool=ProcessPoolExecutor(5) for i in range(10): pool.submit(task,) 打印结果为: 当前进程是: <Process(Process-2, started)> 当前进程是: <Process(Process-3, started)> 当前进程是: <Process(Process-4, started)> 当前进程是: <Process(Process-1, started)> 当前进程是: <Process(Process-5, started)> 一秒钟以后: 当前进程是: <Process(Process-2, started)> 当前进程是: <Process(Process-3, started)> 当前进程是: <Process(Process-4, started)> 当前进程是: <Process(Process-1, started)> 当前进程是: <Process(Process-5, started)>
简单爬虫:
import requests from bs4 import BeautifulSoup from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor def task(url): print(url) r1=requests.get(url=url,headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36' }) #查看下载下来的文本信息 soup=BeautifulSoup(r1.text,'html.parser') print(soup.text) # content_list=soup.find('div',attrs={'id':content_list}) # for item in content_list.find_all('div',attr={'class':'item'}) # title = item.find('a').text.strip() # target_url = item.find('a').get('href') # print(title,target_url) def run(): pool=ThreadPoolExecutor(5) for i in range(1,50): pool.submit(task,'https://dig.chouti.com/all/hot/recent/%s'%i)