线程池
什么是线程池?
诸如web服务器、数据库服务器、文件服务器和邮件服务器等许多服务器应用都面向处理来自某些远程来源的大量短小的任务。
构建服务器应用程序的一个过于简单的模型是:每当一个请求到达就创建一个新的服务对象,然后在新的服务对象中为请求服务。
但当有大量请求并发访问时,服务器不断的创建和销毁对象的开销很大。
所以提高服务器效率的一个手段就是尽可能减少创建和销毁对象的次数,特别是一些很耗资源的对象创建和销毁,这样就引入了“池”的概念,
“池”的概念使得人们可以定制一定量的资源,然后对这些资源进行复用,而不是频繁的创建和销毁。
线程池是预先创建线程的一种技术。
这些线程都是处于睡眠状态,即均为启动,不消耗CPU,而只是占用较小的内存空间。
当请求到来之后,缓冲池给这次请求分配一个空闲线程,把请求传入此线程中运行,进行处理。
当预先创建的线程都处于运行状态,即预制线程不够,线程池可以自由创建一定数量的新线程,用于处理更多的请求。
当系统比较闲的时候,也可以通过移除一部分一直处于停用状态的线程。
线程池的注意事项
虽然线程池是构建多线程应用程序的强大机制,但使用它并不是没有风险的。在使用线程池时需注意线程池大小与性能的关系,注意并发风险、死锁、资源不足和线程泄漏等问题。
1、线程池大小。多线程应用并非线程越多越好,需要根据系统运行的软硬件环境以及应用本身的特点决定线程池的大小。
一般来说,如果代码结构合理的话,线程数目与CPU 数量相适合即可。
如果线程运行时可能出现阻塞现象,可相应增加池的大小;如有必要可采用自适应算法来动态调整线程池的大小,以提高CPU 的有效利用率和系统的整体性能。
2、并发错误。多线程应用要特别注意并发错误,要从逻辑上保证程序的正确性,注意避免死锁现象的发生。
3、线程泄漏。这是线程池应用中一个严重的问题,当任务执行完毕而线程没能返回池中就会发生线程泄漏现象。
线程池要点
1、通过判断等待的任务数量和线程池中的最大值,取最小值来判断开启多少线程来工作
比如:
任务数是3,进程池最大20 ,那么咱们只需要开启3个线程就行了。
任务数是500,进程池是20,那么咱们只开20个线程就可以了。
取最小值
2、实现线程池正在运行,有一个查看的功能,查看一下现在线程里面活跃的线程是多少等待的是多少?
线程总共是多少,等待中多少,正在运行中多少
作用:
方便查看当前线程池状态
能获取到这个之后就可以当线程一直处于空闲状态
查看状态用:上下文管理来做,非常nice的一点
3、关闭线程
实现简单的线程池之前,讲一下用到的一些语法
-
继承线程
class WorkerThread(threading.Thread)
-
request = self._requestQueue.get(True,self._poll_timeout) #就是等_poll_timeout的时间,等不到就返回空;如果不是True就不等,直接看有没有
https://blog.csdn.net/hehe123456zxc/article/details/52275821
队列的源码class Queue: """Create a queue object with a given maximum size. If maxsize is <= 0, the queue size is infinite. """ def __init__(self, maxsize=0): self.maxsize = maxsize self._init(maxsize) # mutex must be held whenever the queue is mutating. All methods # that acquire mutex must release it before returning. mutex # is shared between the three conditions, so acquiring and # releasing the conditions also acquires and releases mutex. self.mutex = _threading.Lock() # Notify not_empty whenever an item is added to the queue; a # thread waiting to get is notified then. self.not_empty = _threading.Condition(self.mutex) # Notify not_full whenever an item is removed from the queue; # a thread waiting to put is notified then. self.not_full = _threading.Condition(self.mutex) # Notify all_tasks_done whenever the number of unfinished tasks # drops to zero; thread waiting to join() is notified to resume self.all_tasks_done = _threading.Condition(self.mutex) self.unfinished_tasks = 0 def task_done(self): """Indicate that a formerly enqueued task is complete. Used by Queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete. If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue). Raises a ValueError if called more times than there were items placed in the queue. """ self.all_tasks_done.acquire() try: unfinished = self.unfinished_tasks - 1 if unfinished <= 0: if unfinished < 0: raise ValueError('task_done() called too many times') self.all_tasks_done.notify_all() self.unfinished_tasks = unfinished finally: self.all_tasks_done.release() def join(self): """Blocks until all items in the Queue have been gotten and processed. The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer thread calls task_done() to indicate the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, join() unblocks. """ self.all_tasks_done.acquire() try: while self.unfinished_tasks: self.all_tasks_done.wait() finally: self.all_tasks_done.release() def qsize(self): """Return the approximate size of the queue (not reliable!).""" self.mutex.acquire() n = self._qsize() self.mutex.release() return n def empty(self): """Return True if the queue is empty, False otherwise (not reliable!).""" self.mutex.acquire() n = not self._qsize() self.mutex.release() return n def full(self): """Return True if the queue is full, False otherwise (not reliable!).""" self.mutex.acquire() n = 0 < self.maxsize == self._qsize() self.mutex.release() return n def put(self, item, block=True, timeout=None): """Put an item into the queue. If optional args 'block' is true and 'timeout' is None (the default), block if necessary until a free slot is available. If 'timeout' is a non-negative number, it blocks at most 'timeout' seconds and raises the Full exception if no free slot was available within that time. Otherwise ('block' is false), put an item on the queue if a free slot is immediately available, else raise the Full exception ('timeout' is ignored in that case). """ self.not_full.acquire() try: if self.maxsize > 0: if not block: if self._qsize() == self.maxsize: raise Full elif timeout is None: while self._qsize() == self.maxsize: self.not_full.wait() elif timeout < 0: raise ValueError("'timeout' must be a non-negative number") else: endtime = _time() + timeout while self._qsize() == self.maxsize: remaining = endtime - _time() if remaining <= 0.0: raise Full self.not_full.wait(remaining) self._put(item) self.unfinished_tasks += 1 self.not_empty.notify() finally: self.not_full.release() def put_nowait(self, item): """Put an item into the queue without blocking. Only enqueue the item if a free slot is immediately available. Otherwise raise the Full exception. """ return self.put(item, False) def get(self, block=True, timeout=None): """Remove and return an item from the queue. If optional args 'block' is true and 'timeout' is None (the default), block if necessary until an item is available. If 'timeout' is a non-negative number, it blocks at most 'timeout' seconds and raises the Empty exception if no item was available within that time. Otherwise ('block' is false), return an item if one is immediately available, else raise the Empty exception ('timeout' is ignored in that case). """ self.not_empty.acquire() try: if not block: if not self._qsize(): raise Empty elif timeout is None: while not self._qsize(): self.not_empty.wait() elif timeout < 0: raise ValueError("'timeout' must be a non-negative number") else: endtime = _time() + timeout while not self._qsize(): remaining = endtime - _time() if remaining <= 0.0: raise Empty self.not_empty.wait(remaining) item = self._get() self.not_full.notify() return item finally: self.not_empty.release() def get_nowait(self): """Remove and return an item from the queue without blocking. Only get an item if one is immediately available. Otherwise raise the Empty exception. """ return self.get(False) # Override these methods to implement other queue organizations # (e.g. stack or priority queue). # These will only be called with appropriate locks held # Initialize the queue representation def _init(self, maxsize): self.queue = deque() def _qsize(self, len=len): return len(self.queue) # Put a new item in the queue def _put(self, item): self.queue.append(item) # Get an item from the queue def _get(self): return self.queue.popleft()
-
if else 之间也有命令,第一次见到这种格式
if 0: print(0) for i in range(1,5): print(i) else: print('else') # output: """ 1 2 3 4 else """
-
self.start()
会去执行run函数
def run()
-
threading.Event()
threading.Event().is_set()
threading.Event().set()
event.wait(time) 等待 time 时间后,执行下一步。或者在调用 event.set() 后立即执行下一步。
event.clear() 清除信号
event.set() 设置信号
event.isSet() 判断是否设置信号
-
`self.setDaemon(True)
* join ()方法:主线程A中,创建了子线程B,并且在主线程A中调用了B.join(),那么,主线程A会在调用的地方等待,直到子线程B完成操作后,才可以接着往下执行,那么在调用这个线程时可以使用被调用线程的join方法。
* setDaemon()方法。主线程A中,创建了子线程B,并且在主线程A中调用了B.setDaemon(),这个的意思是,把主线程A设置为守护线程,这时候,要是主线程A执行结束了,就不管子线程B是否完成,一并和主线程A退出.这就是setDaemon方法的含义,这基本和join是相反的。此外,还有个要特别注意的:必须在start() 方法调用之前设置,如果不设置为守护线程,程序会被无限挂起。 -
Queue.Queue(resq_size) 最大长度限制的队列
-
_threading.Condition(mutex) 互斥锁
-
这里用到的callable,就是每个worker需要的函数
实现简单的线程池
#-*-encoding:utf-8-*-
'''
Created on 2012-3-9
@summary: 线程池
@contact: mailto:zhanglixinseu@gmail.com
@author: zhanglixin
'''
import sys
import threading
import Queue
import traceback
# 定义一些Exception,用于自定义异常处理
class NoResultsPending(Exception):
"""All works requests have been processed"""
pass
class NoWorkersAvailable(Exception):
"""No worket threads available to process remaining requests."""
pass
# 单下划线,弱内部,from M import 不会引入
def _handle_thread_exception(request, exc_info):
"""默认的异常处理函数,只是简单的打印"""
traceback.print_exception(*exc_info)
#classes
class WorkerThread(threading.Thread):
"""后台线程,真正的工作线程,从请求队列(requestQueue)中获取work,
并将执行后的结果添加到结果队列(resultQueue)"""
def __init__(self, requestQueue, resultQueue, poll_timeout=5, **kwds):
threading.Thread.__init__(self,**kwds)
'''设置为守护进行'''
self.setDaemon(True)
self._requestQueue = requestQueue
self._resultQueue = resultQueue
self._poll_timeout = poll_timeout
'''设置一个flag信号,用来表
示该线程是否还被dismiss,默认为false'''
self._dismissed = threading.Event()
self.start()
def run(self):
'''每个线程尽可能多的执行work,所以采用loop,
只要线程可用,并且requestQueue有work未完成,则一直loop'''
while True:
if self._dismissed.is_set():
break
try:
'''
Queue.Queue队列设置了线程同步策略,并且可以设置timeout。
一直block,直到requestQueue有值,或者超时
'''
request = self._requestQueue.get(True,self._poll_timeout)
except Queue.Empty:
continue
else:
'''之所以在这里再次判断dimissed,是因为之前的timeout时间里,很有可能,该线程被dismiss掉了'''
if self._dismissed.is_set():
self._requestQueue.put(request)
break
try:
'''执行callable,讲请求和结果以tuple的方式放入requestQueue'''
result = request.callable(*request.args,**request.kwds)
print self.getName()
self._resultQueue.put((request,result))
except:
'''异常处理'''
request.exception = True
self._resultQueue.put((request,sys.exc_info()))
def dismiss(self):
'''设置一个标志,表示完成当前work之后,退出'''
self._dismissed.set()
class WorkRequest:
'''
@param callable_:,可定制的,执行work的函数
@param args: 列表参数
@param kwds: 字典参数
@param requestID: id
@param callback: 可定制的,处理resultQueue队列元素的函数
@param exc_callback:可定制的,处理异常的函数
'''
def __init__(self,callable_,args=None,kwds=None,requestID=None,
callback=None,exc_callback=_handle_thread_exception):
if requestID == None:
self.requestID = id(self)
else:
try:
self.requestID = hash(requestID)
except TypeError:
raise TypeError("requestId must be hashable")
self.exception = False
self.callback = callback
self.exc_callback = exc_callback
self.callable = callable_
self.args = args or []
self.kwds = kwds or {}
def __str__(self):
return "WorkRequest id=%s args=%r kwargs=%r exception=%s" %
(self.requestID,self.args,self.kwds,self.exception)
class ThreadPool:
'''
@param num_workers:初始化的线程数量
@param q_size,resq_size: requestQueue和result队列的初始大小
@param poll_timeout: 设置工作线程WorkerThread的timeout,也就是等待requestQueue的timeout
'''
def __init__(self,num_workers,q_size=0,resq_size=0,poll_timeout=5):
self._requestQueue = Queue.Queue(q_size)
self._resultQueue = Queue.Queue(resq_size)
self.workers = []
self.dismissedWorkers = []
self.workRequests = {} #设置个字典,方便使用
self.createWorkers(num_workers,poll_timeout)
def createWorkers(self,num_workers,poll_timeout=5):
'''创建num_workers个WorkThread,默认timeout为5'''
for i in range(num_workers):
self.workers.append(WorkerThread(self._requestQueue,self._resultQueue,poll_timeout=poll_timeout))
def dismissWorkers(self,num_workers,do_join=False):
'''停用num_workers数量的线程,并加入dismiss_list'''
dismiss_list = []
for i in range(min(num_workers,len(self.workers))):
worker = self.workers.pop()
worker.dismiss()
dismiss_list.append(worker)
if do_join :
for worker in dismiss_list:
worker.join()
else:
self.dismissedWorkers.extend(dismiss_list)
def joinAllDismissedWorkers(self):
'''join 所有停用的thread'''
#print len(self.dismissedWorkers)
for worker in self.dismissedWorkers:
worker.join()
self.dismissedWorkers = []
def putRequest(self,request ,block=True,timeout=None):
assert isinstance(request,WorkRequest)
assert not getattr(request,'exception',None)
'''当queue满了,也就是容量达到了前面设定的q_size,它将一直阻塞,直到有空余位置,或是timeout'''
self._requestQueue.put(request, block, timeout)
self.workRequests[request.requestID] = request
def poll(self,block = False):
while True:
if not self.workRequests:
raise NoResultsPending
elif block and not self.workers:
raise NoWorkersAvailable
try:
'''默认只要resultQueue有值,则取出,否则一直block'''
request , result = self._resultQueue.get(block=block)
if request.exception and request.exc_callback:
request.exc_callback(request,result)
if request.callback and not (request.exception and request.exc_callback):
request.callback(request,result)
del self.workRequests[request.requestID]
except Queue.Empty:
break
def wait(self):
while True:
try:
self.poll(True)
except NoResultsPending:
break
def workersize(self):
return len(self.workers)
def stop(self):
'''join 所有的thread,确保所有的线程都执行完毕'''
self.dismissWorkers(self.workersize(),True)
self.joinAllDismissedWorkers()
#Test a demo
if __name__=='__main__':
import random
import time
import datetime
def do_work(data):
time.sleep(random.randint(1,3))
res = str(datetime.datetime.now()) + "" +str(data)
return res
def print_result(request,result):
print "---Result from request %s : %r" % (request.requestID,result)
main = ThreadPool(3)
for i in range(40):
req = WorkRequest(do_work,args=[i],kwds={},callback=print_result)
main.putRequest(req)
print "work request #%s added." % req.requestID
print '-'*20, main.workersize(),'-'*20
counter = 0
while True:
try:
time.sleep(0.5)
main.poll()
if(counter==5):
print "Add 3 more workers threads"
main.createWorkers(3)
print '-'*20, main.workersize(),'-'*20
if(counter==10):
print "dismiss 2 workers threads"
main.dismissWorkers(2)
print '-'*20, main.workersize(),'-'*20
counter+=1
except NoResultsPending:
print "no pending results"
break
main.stop()
print "Stop"
实现线程池转自 https://www.cnblogs.com/0x2D-0x22/p/4014645.html
使用轮子
# pip install threadpool // 安装库
pool = ThreadPool(poolsize)
requests = makeRequests(some_callable, list_of_args, callback)
[pool.putRequest(req) for req in requests]
pool.wait()