zoukankan      html  css  js  c++  java
  • 机器学习连载001

    字典预处理

    from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
    from sklearn.feature_extraction import DictVectorizer
    from sklearn.preprocessing import MinMaxScaler, StandardScaler
    from sklearn.feature_selection import VarianceThreshold
    from sklearn.decomposition import PCA
    from scipy.stats import pearsonr
    import jieba
    import pandas as pd
    
    
    
    
    def dict_vec():
    
        # 实例化dict
        # dict = DictVectorizer()
        dict = DictVectorizer(sparse=False)
        # diaoyong fit_transform
        data = dict.fit_transform([{'city': '北京','temperature':100},{'city': '上海','temperature':60},{'city': '深圳','temperature':30}])
    
        # 打印每一个列的名称
        print(dict.get_feature_names())
        print(data)
    
        return None
    
    if __name__ == '__main__':
        dict_vec()
    View Code

     文本的预处理

    from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
    from sklearn.feature_extraction import DictVectorizer
    from sklearn.preprocessing import MinMaxScaler, StandardScaler
    from sklearn.feature_selection import VarianceThreshold
    from sklearn.decomposition import PCA
    from scipy.stats import pearsonr
    import jieba
    import pandas as pd
    
    
    def dict_vec():
    
        # 实例化dict
        # dict = DictVectorizer()
        dict = DictVectorizer(sparse=False)
        # diaoyong fit_transform
        data = dict.fit_transform([{'city': '北京','temperature':100},{'city': '上海','temperature':60},{'city': '深圳','temperature':30}])
    
        # 打印每一个列的名称
        print(dict.get_feature_names())
        print(data)
    
        return None
    
    
    def countvec():
        # 实例化conunt
        count = CountVectorizer()
        # 对两篇文章进行特征抽取
        data = count.fit_transform(["人生 人生 苦短,我 喜 欢Python", "生 活太 长 久,我不 喜欢P ython"])
        # 内容
        print(count.get_feature_names())
        print(data.toarray())
        # print(data)
    
        return None
    
    if __name__ == '__main__':
        countvec()
    View Code
  • 相关阅读:
    快来使用Portainer让测试环境搭建飞起来吧
    ReviewBoard安装记录(CentOS5)
    awk中的NR,FNR ,NF,$NF,RS,ORS,FS,OFS
    Argument list too long解决办法
    Jenkins插件开发(6.3)—— 追踪jenkinscli.jar
    AWK中如何按列求和
    JIRA中显示中文显示为乱码“口口口”的解决方式(CentOS)
    Jenkins常用插件记录
    Jenkins插件开发(6.4)—— 分析CLI源码
    Jenkins插件开发(6.2)—— 如何自定义CLI命令
  • 原文地址:https://www.cnblogs.com/cerofang/p/10161069.html
Copyright © 2011-2022 走看看