zoukankan      html  css  js  c++  java
  • Python 结巴分词(2)关键字提取

    提取关键字的文章是,小说完美世界的前十章;

    我事先把前十章合并到了一个文件中;

    然后直接调用关键字函数;

     1 import sys
     2 sys.path.append('../')
     3 
     4 import jieba
     5 import jieba.analyse
     6 from optparse import OptionParser#引入关键词的包
     7 from docopt import docopt
     8 data_path = "C:\Users\wangyuguang\Desktop\work_data\profect_world\"
     9 topK = 10
    10 withWeight = False
    11 content = ""
    12 for i in range(1,2):
    13     Data_path = data_path + "he"+".txt"
    14     content ="".join(open(Data_path, 'rb').read())
    15 # print content
    16 tags = jieba.analyse.extract_tags(content, topK=topK, withWeight=withWeight)#直接调用
    17 
    18 if withWeight is True:
    19     for tag in tags:
    20         print("tag: %s		 weight: %f" % (tag[0],tag[1]))
    21 else:
    22     print(",".join(tags))

    关键字结果:

    Building prefix dict from the default dictionary ...
    Loading model from cache c:userswangyuguangappdatalocal	empjieba.cache
    Loading model cost 0.386 seconds.
    Prefix dict has been built succesfully.
    小不点,孩子,族长,石云峰,石村,凶禽,青鳞鹰,凶兽,一群,石昊
  • 相关阅读:
    自定义组件要加@click方法
    绑定样式
    647. Palindromic Substrings
    215. Kth Largest Element in an Array
    448. Find All Numbers Disappeared in an Array
    287. Find the Duplicate Number
    283. Move Zeroes
    234. Palindrome Linked List
    202. Happy Number
    217. Contains Duplicate
  • 原文地址:https://www.cnblogs.com/lovychen/p/5681019.html
Copyright © 2011-2022 走看看