zoukankan html css js c++ java

python 去停用词

Try caching the stopwords object, as shown below. Constructing this each time you call the function seems to be the bottleneck.

    from nltk.corpus import stopwords

    cachedStopWords = stopwords.words("english")

    def testFuncOld():
        text = 'hello bye the the hi'
        text = ' '.join([word for word in text.split() if word not in stopwords.words("english")])

    def testFuncNew():
        text = 'hello bye the the hi'
        text = ' '.join([word for word in text.split() if word not in cachedStopWords])

    if __name__ == "__main__":
        for i in xrange(10000):
            testFuncOld()
            testFuncNew()

I ran this through the profiler: python -m cProfile -s cumulative test.py. The relevant lines are posted below.

nCalls Cumulative Time

10000 7.723 words.py:7(testFuncOld)

10000 0.140 words.py:11(testFuncNew)

So, caching the stopwords instance gives a ~70x speedup.

查看全文

相关阅读:
2018.7.26笔记(变量的数据类型,if语句)
id(),is 和 ==的区别,编码和解
 2018.7.31笔记(列表的基本操作)
阅读与感悟如何高效学习
 说说设计模式单例模式
 简单说说Java知识点多线程
 阅读与感悟联盟
 阅读与感悟非暴力沟通
 简单说说Java知识点 HashMap
MySQL知识树存储引擎

原文地址：https://www.cnblogs.com/Donal/p/6902048.html