zoukankan      html  css  js  c++  java
  • jieba分词单例模式及linux权限不够情况下tmp_dir自定义

    在linux环境下,没有root权限的情况下,有时会碰到如下问题:

    Building prefix dict from the default dictionary ...
    Loading model from cache /tmp/jieba.cache
    Dumping model to file cache /tmp/jieba.cache
    Dump cache file failed.
    Traceback (most recent call last):
      File "/home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py", line 153, in initialize
        _replace_file(fpath, cache_file)
    OSError: [Errno 1] Operation not permitted

    这是因为jieba默认情况下在/tmp下存储缓存文件,然而不是root用户,权限不够。解决办法是修改默认缓存文件的目录,把缓存文件放在用户的目录下面。 jieba文档提到了tmp_dir和cache_file可以改,所以我们查看了下源码

    /home/work/anaconda3/envs/py27/lib/python2.7/site-packages/jieba/__init__.py,文件52行-66行如下:
    class Tokenizer(object):
    
        def __init__(self, dictionary=DEFAULT_DICT):
            self.lock = threading.RLock()
            if dictionary == DEFAULT_DICT:
                self.dictionary = dictionary
            else:
                self.dictionary = _get_abs_path(dictionary)
            self.FREQ = {}
            self.total = 0
            self.user_word_tag_tab = {}
            self.initialized = False
            self.tmp_dir = None
            # self.tmp_dir = '/'
            self.cache_file = None

    修改源码,在64行self.tmp_dir中可以设置自定义缓存路径。 

    另外一种方式是在代码中修改,以下是jieba单例模式demo

     1 class Singleton(object):
     2     """
     3     Jieba Utils Class
     4     """
     5     _instance = None
     6 
     7     def __new__(cls, *args, **kwargs):
     8         if not cls._instance:
     9             cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs)
    10         return cls._instance
    11 
    12 
    13 class JiebaUtil(Singleton):
    14     """
    15     jiebautil 工具包
    16     """
    17     _jieba_instance = None
    18 
    19     def get_instance(self):
    20         """
    21         get the global jieba instance
    22         """
    23         if self._jieba_instance:
    24             return self._jieba_instance
    25         print 'initialize...'
    26         obj = jieba.Tokenizer()
    27         obj.tmp_dir = dirpath
    28         obj.load_userdict(user_dict_path)
    29         obj.initialize()
    30         self._jieba_instance = obj
    31         return obj
    32 
    33 
    34 if __name__ == '__main__':
    35 
    36     one = JiebaUtil()
    37     two = JiebaUtil()
    38 
    39     print one == two
    40 
    41     tkn = one.get_instance()
    42     tkn2 = one.get_instance()
    43     print tkn == tkn2
    44 
    45     print id(one), id(two)
    46 
    47     print id(tkn), id(tkn2)

    在27行中可以设置自定义的他们tmp_dir缓存路径。

    参考:

    http://funhacks.net/2017/01/17/singleton/

    https://blog.csdn.net/sijiaqi11/article/details/78601258

  • 相关阅读:
    QOMO Linux 4.0 正式版发布
    LinkChecker 8.1 发布,网页链接检查
    pgBadger 2.1 发布,PG 日志分析
    Aletheia 0.1.1 发布,HTTP 调试工具
    Teiid 8.2 Beta1 发布,数据虚拟化系统
    zLogFabric 2.2 发布,集中式日志存储系统
    开源电子工作套件 Arduino Start Kit 登场
    Piwik 1.9 发布,网站访问统计系统
    Ruby 1.9.3p286 发布,安全修复版本
    toBraille 1.1.2 发布,Java 盲文库
  • 原文地址:https://www.cnblogs.com/shizhh/p/10599931.html
Copyright © 2011-2022 走看看