zoukankan      html  css  js  c++  java
  • jieba分词单例模式及linux权限不够情况下tmp_dir自定义

    在linux环境下,没有root权限的情况下,有时会碰到如下问题:

    Building prefix dict from the default dictionary ...
    Loading model from cache /tmp/jieba.cache
    Dumping model to file cache /tmp/jieba.cache
    Dump cache file failed.
    Traceback (most recent call last):
      File "/home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py", line 153, in initialize
        _replace_file(fpath, cache_file)
    OSError: [Errno 1] Operation not permitted

    这是因为jieba默认情况下在/tmp下存储缓存文件,然而不是root用户,权限不够。解决办法是修改默认缓存文件的目录,把缓存文件放在用户的目录下面。 jieba文档提到了tmp_dir和cache_file可以改,所以我们查看了下源码

    /home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py,文件52行-66行如下:
    class Tokenizer(object):
    
        def __init__(self, dictionary=DEFAULT_DICT):
            self.lock = threading.RLock()
            if dictionary == DEFAULT_DICT:
                self.dictionary = dictionary
            else:
                self.dictionary = _get_abs_path(dictionary)
            self.FREQ = {}
            self.total = 0
            self.user_word_tag_tab = {}
            self.initialized = False
            self.tmp_dir = None
            # self.tmp_dir = '/'
            self.cache_file = None

    修改源码,在64行self.tmp_dir中可以设置自定义缓存路径。 

    另外一种方式是在代码中修改,以下是jieba单例模式demo

     1 class Singleton(object):
     2     """
     3     Jieba Utils Class
     4     """
     5     _instance = None
     6 
     7     def __new__(cls, *args, **kwargs):
     8         if not cls._instance:
     9             cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs)
    10         return cls._instance
    11 
    12 
    13 class JiebaUtil(Singleton):
    14     """
    15     jiebautil 工具包
    16     """
    17     _jieba_instance = None
    18 
    19     def get_instance(self):
    20         """
    21         get the global jieba instance
    22         """
    23         if self._jieba_instance:
    24             return self._jieba_instance
    25         print 'initialize...'
    26         obj = jieba.Tokenizer()
    27         obj.tmp_dir = dirpath
    28         obj.load_userdict(user_dict_path)
    29         obj.initialize()
    30         self._jieba_instance = obj
    31         return obj
    32 
    33 
    34 if __name__ == '__main__':
    35 
    36     one = JiebaUtil()
    37     two = JiebaUtil()
    38 
    39     print one == two
    40 
    41     tkn = one.get_instance()
    42     tkn2 = one.get_instance()
    43     print tkn == tkn2
    44 
    45     print id(one), id(two)
    46 
    47     print id(tkn), id(tkn2)

    在27行中可以设置自定义的他们tmp_dir缓存路径。

    参考:

    http://funhacks.net/2017/01/17/singleton/

    https://blog.csdn.net/sijiaqi11/article/details/78601258

  • 相关阅读:
    C# 接口
    C# 多态
    C# 继承
    C# 封装
    动态规划:从新手到专家
    hduoj题目分类
    4.2 最邻近规则分类(K-Nearest Neighbor)KNN算法应用
    警惕自增的陷阱(++)
    五大常用算法之四:回溯法
    算法java实现--回溯法--图的m着色问题
  • 原文地址:https://www.cnblogs.com/shizhh/p/10599931.html
Copyright © 2011-2022 走看看