zoukankan      html  css  js  c++  java
  • python 分词

    #encoding=utf-8
    import jieba
    
    seg_list = jieba.cut("明天不上班啊",cut_all=True)
    print ("Full Mode:", "/ ".join(seg_list))
    
    seg_list = jieba.cut("明天不上班啊",cut_all=False)
    print ("Default Mode:", "/ ".join(seg_list))
    
    seg_list = jieba.cut("明天不上班啊")
    print (", ".join(seg_list))

    打印结果:

    F:python-studyfenci>python test.py
    Building prefix dict from C:Python33libsite-packagesjiebadict.txt ...
    Loading model from cache c:userszhaoji~1appdatalocal empjieba.cache
    Loading model cost 0.840 seconds.
    Prefix dict has been built succesfully.
    Full Mode: 明天/ 不/ 上班/ 啊
    Default Mode: 明天/ 不/ 上班/ 啊
    明天, 不, 上班, 啊

    python分词工具:jieba

    1、运行后错误:

    F:python-studyfenci>python test.py
    File "test.py", line 3
    SyntaxError: Non-UTF-8 code starting with 'xce' in file test.py on line 3, but
    no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

    查询资料,发现是编辑的编码问题,notepad打开,下边显示ansi,需要设置 转换为utf-8即可

    2、python 3的print需要增加括号

    print()

    测试:

    #coding=utf-8
    import jieba
    import jieba.posseg as pseg
    
    f=open("in.txt","r") #读取文本
    string=f.read()
    
    words = pseg.cut(string) #进行分词
    result=""  
    for w in words:
         result+= str(w.word)+"/"+str(w.flag) #加词性标注
    
    f=open("out.txt","w") 
    f.write(result)
    f.close()
  • 相关阅读:
    Google Authentication 机制原理
    ldap日志
    ldap + kerberos 整合
    kerberos
    U盘格式化后的恢复
    初始化脚本(Os_Init_Optimization.sh)
    拿到新机器,进行初始化和部署Nginx的过程
    python 列表生成式
    python 装饰器
    简单总结无线CPE、无线AP、无线网桥的不同之处【转】
  • 原文地址:https://www.cnblogs.com/huanhuanang/p/4750343.html
Copyright © 2011-2022 走看看