zoukankan      html  css  js  c++  java
  • <Think Python>中统计文献单词的处理代码

    def process_line(line, hist):
        """Adds the words in the line to the histogram.

        Modifies hist.

        line: string
        hist: histogram (map from word to frequency)
        """
        # replace hyphens with spaces before splitting
        line = line.replace('-', ' ')
        
        for word in line.split():
            # remove punctuation and convert to lowercase
            word = word.strip(string.punctuation + string.whitespace)    # 单词的分割要找到其本质特点,其首尾必然是字母(中部可能出现标点,如isn't)
            word = word.lower()

            # update the histogram
            hist[word] = hist.get(word, 0) + 1

  • 相关阅读:
    INTZ DX format
    多线程渲染
    BindVertexbuffer
    Lock flag DX
    Triple buffering and vsync
    可迭代对象和迭代器
    装饰器
    闭包函数
    名称空间和作用域
    函数对象
  • 原文地址:https://www.cnblogs.com/instona/p/3350153.html
Copyright © 2011-2022 走看看