zoukankan      html  css  js  c++  java
  • Python 结巴分词(2)关键字提取

    提取关键字的文章是,小说完美世界的前十章;

    我事先把前十章合并到了一个文件中;

    然后直接调用关键字函数;

     1 import sys
     2 sys.path.append('../')
     3 
     4 import jieba
     5 import jieba.analyse
     6 from optparse import OptionParser#引入关键词的包
     7 from docopt import docopt
     8 data_path = "C:\Users\wangyuguang\Desktop\work_data\profect_world\"
     9 topK = 10
    10 withWeight = False
    11 content = ""
    12 for i in range(1,2):
    13     Data_path = data_path + "he"+".txt"
    14     content ="".join(open(Data_path, 'rb').read())
    15 # print content
    16 tags = jieba.analyse.extract_tags(content, topK=topK, withWeight=withWeight)#直接调用
    17 
    18 if withWeight is True:
    19     for tag in tags:
    20         print("tag: %s		 weight: %f" % (tag[0],tag[1]))
    21 else:
    22     print(",".join(tags))

    关键字结果:

    Building prefix dict from the default dictionary ...
    Loading model from cache c:userswangyuguangappdatalocal	empjieba.cache
    Loading model cost 0.386 seconds.
    Prefix dict has been built succesfully.
    小不点,孩子,族长,石云峰,石村,凶禽,青鳞鹰,凶兽,一群,石昊
  • 相关阅读:
    idea配置tomcat
    idea中配置tomcat乱码问题--记录处理经验
    svn提交报错值 remains tree in conflict
    Vue之MVVM
    python 时间日期处理
    SVN使用指南
    linux查看硬件信息
    软件集成过程标准化的建议
    术语辨析
    科学计数e+转成正常str
  • 原文地址:https://www.cnblogs.com/lovychen/p/5681019.html
Copyright © 2011-2022 走看看