zoukankan      html  css  js  c++  java
  • 中文文本分类

    import os
    path =r'C:UsersAdministratorDesktop369data'
    
    def readfile(path):
        for root,dirs,files in os.walk(path):
                #print(root)
                #print(dirs)
            for f in files:
                fn = os.path.join(root,f)
    #             size = os.path.getsize(fn)
    #             print(fn,size)
                genInfo(fn)
    
    import numpy as np
    def genInfo(path):
        classfity = fn.split('\')[-2] # 获取类别
        with open(fn,'r',encoding='utf-8') as f:
            content = f.read() # 获取文本
            
    # import jieba
    # import jieba.posseg as psg 
    # file_path = r'C:UsersAdministratorDesktopstopsCN.txt'
    # fo = open(file_path,'r',encoding='utf-8').read()
    
    # stops = np.loadtxt(file_path,dtype=str,delimiter=r'	',encoding='utf-8')
    # stops.shape
    # tokens=[token for token in tokens if token not in stops]
    # tokens
  • 相关阅读:
    map
    构造函数和对象
    for...in...and for each...in...
    事件
    JSON
    css伪类
    正则表达式
    什么是DOM、什么是BOM
    CSS颜色
    grid-layout实验
  • 原文地址:https://www.cnblogs.com/hodafu/p/10113126.html
Copyright © 2011-2022 走看看