在linux环境下,没有root权限的情况下,有时会碰到如下问题:
Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Dumping model to file cache /tmp/jieba.cache Dump cache file failed. Traceback (most recent call last): File "/home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py", line 153, in initialize _replace_file(fpath, cache_file) OSError: [Errno 1] Operation not permitted
这是因为jieba默认情况下在/tmp下存储缓存文件,然而不是root用户,权限不够。解决办法是修改默认缓存文件的目录,把缓存文件放在用户的目录下面。 jieba文档提到了tmp_dir和cache_file可以改,所以我们查看了下源码
/home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py,文件52行-66行如下:
class Tokenizer(object): def __init__(self, dictionary=DEFAULT_DICT): self.lock = threading.RLock() if dictionary == DEFAULT_DICT: self.dictionary = dictionary else: self.dictionary = _get_abs_path(dictionary) self.FREQ = {} self.total = 0 self.user_word_tag_tab = {} self.initialized = False self.tmp_dir = None # self.tmp_dir = '/' self.cache_file = None
修改源码,在64行self.tmp_dir中可以设置自定义缓存路径。
另外一种方式是在代码中修改,以下是jieba单例模式demo
1 class Singleton(object): 2 """ 3 Jieba Utils Class 4 """ 5 _instance = None 6 7 def __new__(cls, *args, **kwargs): 8 if not cls._instance: 9 cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs) 10 return cls._instance 11 12 13 class JiebaUtil(Singleton): 14 """ 15 jiebautil 工具包 16 """ 17 _jieba_instance = None 18 19 def get_instance(self): 20 """ 21 get the global jieba instance 22 """ 23 if self._jieba_instance: 24 return self._jieba_instance 25 print 'initialize...' 26 obj = jieba.Tokenizer() 27 obj.tmp_dir = dirpath 28 obj.load_userdict(user_dict_path) 29 obj.initialize() 30 self._jieba_instance = obj 31 return obj 32 33 34 if __name__ == '__main__': 35 36 one = JiebaUtil() 37 two = JiebaUtil() 38 39 print one == two 40 41 tkn = one.get_instance() 42 tkn2 = one.get_instance() 43 print tkn == tkn2 44 45 print id(one), id(two) 46 47 print id(tkn), id(tkn2)
在27行中可以设置自定义的他们tmp_dir缓存路径。
参考: