跑一个使用jieba分词的脚本出现问题
报错如下:
Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Dumping model to file cache /tmp/jieba.cache Dump cache file failed. Traceback (most recent call last): File "/home1/yanghan/anaconda3/envs/py/lib/python3.7/site-packages/jieba/__init__.py", line 154, in initialize _replace_file(fpath, cache_file) PermissionError: [Errno 1] Operation not permitted: '/tmp/tmpg255ml7f' -> '/tmp/jieba.cache' Loading model cost 0.900 seconds. Prefix dict has been built successfully.
原因
是由于jieba在系统根目录下创建缓存文件/temp/jieba.cache来存储模型,但用户权限不够。
一般是在服务器上,因为不是root权限跑代码,所以出现此错误
解决方法
是修改默认缓存文件的目录,把缓存文件放在用户目录下。
在源码line64把self.tmp_dir赋值为用户目录下的任意目录例如"/home1/yanghan",self.cache_file不需要修改。
上面出错时有提示:
File "/home1/yanghan/anaconda3/envs/py/lib/python3.7/site-packages/jieba/__init__.py", line 154, in initialize
_replace_file(fpath, cache_file)
就到这个目录,修改jieba的源代码:
vi /home1/yanghan/anaconda3/envs/py/lib/python3.7/site-packages/jieba/__init__.py"
把
class Tokenizer(object): def __init__(self, dictionary=DEFAULT_DICT): self.lock = threading.RLock() if dictionary == DEFAULT_DICT: self.dictionary = dictionary else: self.dictionary = _get_abs_path(dictionary) self.FREQ = {} self.total = 0 self.user_word_tag_tab = {} self.initialized = False self.tmp_dir = None self.cache_file = None
修改为:
class Tokenizer(object): def __init__(self, dictionary=DEFAULT_DICT): self.lock = threading.RLock() if dictionary == DEFAULT_DICT: self.dictionary = dictionary else: self.dictionary = _get_abs_path(dictionary) self.FREQ = {} self.total = 0 self.user_word_tag_tab = {} self.initialized = False self.tmp_dir = "/home1/yanghan/" self.cache_file = None
参考:
https://blog.csdn.net/u013421629/article/details/91393781