zoukankan      html  css  js  c++  java
  • python 去停用词

    Try caching the stopwords object, as shown below. Constructing this each time you call the function seems to be the bottleneck.

        from nltk.corpus import stopwords
    
        cachedStopWords = stopwords.words("english")
    
        def testFuncOld():
            text = 'hello bye the the hi'
            text = ' '.join([word for word in text.split() if word not in stopwords.words("english")])
    
        def testFuncNew():
            text = 'hello bye the the hi'
            text = ' '.join([word for word in text.split() if word not in cachedStopWords])
    
        if __name__ == "__main__":
            for i in xrange(10000):
                testFuncOld()
                testFuncNew()

    I ran this through the profiler: python -m cProfile -s cumulative test.py. The relevant lines are posted below.

    nCalls Cumulative Time

    10000 7.723 words.py:7(testFuncOld)

    10000 0.140 words.py:11(testFuncNew)

    So, caching the stopwords instance gives a ~70x speedup.

  • 相关阅读:
    redis 学习(17) -- RDB
    51单片机程序技巧
    无效设备解决办法
    210板子启动笔记
    RFID读卡器设置卡
    Socket简介
    /etc/hosts.conf
    TVP5150摄像头
    maven小试牛刀
    2014图灵技术图书最受欢迎TOP15
  • 原文地址:https://www.cnblogs.com/Donal/p/6902048.html
Copyright © 2011-2022 走看看