zoukankan      html  css  js  c++  java
  • Clustering text documents using k-means

    源代码的链接为http://scikit-learn.org/stable/auto_examples/text/document_clustering.html

    Loading 20 newsgroups dataset for categories:
    ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
    3387 documents
    4 categories
    
    Extracting features from the training dataset using a sparse vectorizer
    done in 2.980000s
    n_samples: 3387, n_features: 10000
    
    Clustering sparse data with MiniBatchKMeans(batch_size=1000, compute_labels=True, init='k-means++',
            init_size=1000, max_iter=100, max_no_improvement=10, n_clusters=4,
            n_init=1, random_state=None, reassignment_ratio=0.01, tol=0.0,
            verbose=False)
    done in 0.514s
    
    Homogeneity: 0.506
    Completeness: 0.576
    V-measure: 0.539
    Adjusted Rand-Index: 0.477
    Silhouette Coefficient: 0.006
    
    Top terms per cluster:
    Cluster 0: hst nasa mission jpl ___ gov baalke access orbit __
    Cluster 1: space henry nasa access toronto com alaska digex pat sky
    Cluster 2: god com people sandvik keith don jesus article say think
    Cluster 3: graphics com university thanks posting image host nntp computer ac

    一、

    TfidfVectorizer

    HashingVectorizer

    二、

    Two algorithms are demoed: ordinary k-means and its more scalable cousin minibatch k-means

    (To be continued)

  • 相关阅读:
    四种方案解决ScrollView嵌套ListView问题
    [Android Bug] ListView中Header, Footer无法隐藏(gone)的问题
    Mysql介绍,与将脚本导入新数据库
    000 SpringBoot属性配置
    navicat的安装
    gradle
    004 Numpy
    003 Scipy库简介
    Mysql安装(绿色版安装)
    010 secondary namenode(同步元数据和日志)
  • 原文地址:https://www.cnblogs.com/gui0901/p/4456935.html
Copyright © 2011-2022 走看看