工作中遇到一个问题,就是有一些需要对数据库做全表扫描,而且对结果要求比较宽松的地方,总觉得可以找地方优化,比如暂时保存计算结果。
首先想起来的就是functools.lru_cache,但是可惜在python2.7中没有这个装饰器。
然后就是在stackoverflow找了一个:
(来源:https://stackoverflow.com/questions/11815873/memoization-library-for-python-2-7)
1 import time 2 import functools 3 import collections 4 5 def lru_cache(maxsize = 255, timeout = None): 6 """lru_cache(maxsize = 255, timeout = None) --> returns a decorator which returns an instance (a descriptor). 7 8 Purpose - This decorator factory will wrap a function / instance method and will supply a caching mechanism to the function. 9 For every given input params it will store the result in a queue of maxsize size, and will return a cached ret_val 10 if the same parameters are passed. 11 12 Params - maxsize - int, the cache size limit, anything added above that will delete the first values enterred (FIFO). 13 This size is per instance, thus 1000 instances with maxsize of 255, will contain at max 255K elements. 14 - timeout - int / float / None, every n seconds the cache is deleted, regardless of usage. If None - cache will never be refreshed. 15 16 Notes - If an instance method is wrapped, each instance will have it's own cache and it's own timeout. 17 - The wrapped function will have a cache_clear variable inserted into it and may be called to clear it's specific cache. 18 - The wrapped function will maintain the original function's docstring and name (wraps) 19 - The type of the wrapped function will no longer be that of a function but either an instance of _LRU_Cache_class or a functool.partial type. 20 21 On Error - No error handling is done, in case an exception is raised - it will permeate up. 22 """ 23 24 class _LRU_Cache_class(object): 25 def __init__(self, input_func, max_size, timeout): 26 self._input_func = input_func 27 self._max_size = max_size 28 self._timeout = timeout 29 30 # This will store the cache for this function, format - {caller1 : [OrderedDict1, last_refresh_time1], caller2 : [OrderedDict2, last_refresh_time2]}. 31 # In case of an instance method - the caller is the instance, in case called from a regular function - the caller is None. 32 self._caches_dict = {} 33 34 def cache_clear(self, caller = None): 35 # Remove the cache for the caller, only if exists: 36 if caller in self._caches_dict: 37 del self._caches_dict[caller] 38 self._caches_dict[caller] = [collections.OrderedDict(), time.time()] 39 40 def __get__(self, obj, objtype): 41 """ Called for instance methods """ 42 return_func = functools.partial(self._cache_wrapper, obj) 43 return_func.cache_clear = functools.partial(self.cache_clear, obj) 44 # Return the wrapped function and wraps it to maintain the docstring and the name of the original function: 45 return functools.wraps(self._input_func)(return_func) 46 47 def __call__(self, *args, **kwargs): 48 """ Called for regular functions """ 49 return self._cache_wrapper(None, *args, **kwargs) 50 # Set the cache_clear function in the __call__ operator: 51 __call__.cache_clear = cache_clear 52 53 54 def _cache_wrapper(self, caller, *args, **kwargs): 55 # Create a unique key including the types (in order to differentiate between 1 and '1'): 56 kwargs_key = "".join(map(lambda x : str(x) + str(type(kwargs[x])) + str(kwargs[x]), sorted(kwargs))) 57 key = "".join(map(lambda x : str(type(x)) + str(x) , args)) + kwargs_key 58 59 # Check if caller exists, if not create one: 60 if caller not in self._caches_dict: 61 self._caches_dict[caller] = [collections.OrderedDict(), time.time()] 62 else: 63 # Validate in case the refresh time has passed: 64 if self._timeout != None: 65 if time.time() - self._caches_dict[caller][1] > self._timeout: 66 self.cache_clear(caller) 67 68 # Check if the key exists, if so - return it: 69 cur_caller_cache_dict = self._caches_dict[caller][0] 70 if key in cur_caller_cache_dict: 71 return cur_caller_cache_dict[key] 72 73 # Validate we didn't exceed the max_size: 74 if len(cur_caller_cache_dict) >= self._max_size: 75 # Delete the first item in the dict: 76 cur_caller_cache_dict.popitem(False) 77 78 # Call the function and store the data in the cache (call it with the caller in case it's an instance function - Ternary condition): 79 cur_caller_cache_dict[key] = self._input_func(caller, *args, **kwargs) if caller != None else self._input_func(*args, **kwargs) 80 return cur_caller_cache_dict[key] 81 82 83 # Return the decorator wrapping the class (also wraps the instance to maintain the docstring and the name of the original function): 84 return (lambda input_func : functools.wraps(input_func)(_LRU_Cache_class(input_func, maxsize, timeout)))
但是会出现一个问题,那就是以后部署的话,会有多个服务器部署在nginx后面,但是这些缓存结果是保存在单个服务器的,那么就会在不同的请求结果就可能出现不一致,那么怎么办?
放进redis?
然后就想起来了flask-cache,但是可惜,如果用这个缓存普通函数的计算结果会报错。
最后,只能自己动手写一个了:
1 def cache_func_redis(timeout=100): 2 def decorator(func): 3 @wraps(func) 4 def wrapper(*args,**kwargs): 5 lst_dct = sorted([{k: kwargs[k]} for k in kwargs], key=lambda d:d.keys()[0]) 6 lst = [str(d.values()[0]) for d in lst_dct] 7 k = ''.join([func.__name__, str(args), ''.join(lst)]) 8 r = redis.Redis(connection_pool=cache_redis) 9 d = r.get(k) 10 if d: 11 res = json.loads(d)['res'] 12 return res 13 res = func(*args, **kwargs) 14 d = json.dumps({ 15 'res': res 16 }) 17 r.set(k, d) 18 r.expire(k, timeout) 19 return res 20 return wrapper 21 return decorator
利用函数名和传入的参数,提取特征值作为redis中存入的名字,把计算结果存入redis,失效时间为timeout,但是需要注意的是,
*如果传入的参数为字典,那么可能不会被命中
*被缓存的结果必须为对准确性时效性要求不高的地方
*被缓存的结果应该为基本的python数据结构,否则可能会报错
*还没有做压力测试,等做了压力测试把结果传上来
参考资料:
https://github.com/python/cpython/blob/3.4/Lib/functools.py
https://stackoverflow.com/questions/11815873/memoization-library-for-python-2-7