jieba—parallel

jieba 并行处理进行测试，注意：并行分词仅支持默认分词器 jieba.dt 和 jieba.posseg.dt

import sys
import time
import jieba

jieba.enable_parallel()

#url = sys.argv[1]
content = open("/ssd/ailab-dataset/THUCNewsSubset/cnews.train.txt","rb").read()
t1 = time.time()
words = "/ ".join(jieba.cut(content))

t2 = time.time()
tm_cost = t2-t1

log_f = open("1.log","wb")
log_f.write(words.encode('utf-8'))

print('speed %s bytes/second' % (len(content)/tm_cost))

测试结果：

#把jieba.enable_parallel()注释掉了
[root@n6 jieba-parallel-test]# python test.py      
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.289 seconds.
Prefix dict has been built succesfully.
speed 259919.622884 bytes/second

#加上了jieba.enable_parallel()
[root@n6 jieba-parallel-test]# vi test.py
[root@n6 jieba-parallel-test]# vi test.py
[root@n6 jieba-parallel-test]# python test.py
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.263 seconds.
Prefix dict has been built succesfully.
speed 2215307.40079 bytes/second

加了并行，快很多哟！！！

查看全文

相关阅读:
力扣516题、72题、1312题（最长回文子序列，编辑距离，构造回文串）
力扣53题、1143题（最大子数组问题、最长公共子序列）
力扣704题、34题（二分查找）
力扣300题、354题（最长递增子序列，信封嵌套）
力扣509题、70题（斐波那契数列、爬楼梯）
力扣206题、92题、25题（反转链表）
力扣234题（回文链表）
力扣239题（单调队列）
力扣496题、503题（单调栈）
面试题简答题

原文地址：https://www.cnblogs.com/helloworld0604/p/9633806.html