zoukankan html css js c++ java

python 词频统计

def word_frequency():
    word_dict = {}
    with open('E:\PythonFile\tingyongci.txt') as ti:
        ti_list = list(ti.read())   #   获取停用词表（综合哈工大停用词词表）
    with open('E:\PythonFile\jd\phone\3133927.txt') as wf:
        comments = list(wf.read().split())
        for comment in comments:
            if comment in ti_list:
                continue
            else:
                if comment not in word_dict:
                    word_dict[comment] = int(1)
                else:
                    word_dict[comment] += 1
    file = open('E:\PythonFile\jd\phone\test.txt', mode='a')　　#　　将处理结果存到本地TXT文件中
    sorted(word_dict.items(), key=lambda item: item[1])　　　　#　　按value将字典排序
    for key in word_dict:
        print(key, word_dict[key])
        file.write(key + ' ' + str(word_dict[key]) + '
')    # 写入文档
    file.close()


用jieba分词处理字符串，将分词结果存到TXT文件中
去停用词

查看全文

相关阅读:
DataGrid通过程序的方式锁定任意指定的行
 DataGrid 风格管理类测试版源码
 C#一个Ini操作类
 关于回收站的疑问
 Google启动视频搜索服务(zz)
用脚本实现“修复连接”的功能
 FC4之初体验
 常见笔试/面试题目(zz)
微软新图形工具Acrylic公测(zz)
纪念访问量突破百万:)

原文地址：https://www.cnblogs.com/muty/p/8523456.html