zoukankan      html  css  js  c++  java
  • Clustering text documents using k-means

    源代码的链接为http://scikit-learn.org/stable/auto_examples/text/document_clustering.html

    Loading 20 newsgroups dataset for categories:
    ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
    3387 documents
    4 categories
    
    Extracting features from the training dataset using a sparse vectorizer
    done in 2.980000s
    n_samples: 3387, n_features: 10000
    
    Clustering sparse data with MiniBatchKMeans(batch_size=1000, compute_labels=True, init='k-means++',
            init_size=1000, max_iter=100, max_no_improvement=10, n_clusters=4,
            n_init=1, random_state=None, reassignment_ratio=0.01, tol=0.0,
            verbose=False)
    done in 0.514s
    
    Homogeneity: 0.506
    Completeness: 0.576
    V-measure: 0.539
    Adjusted Rand-Index: 0.477
    Silhouette Coefficient: 0.006
    
    Top terms per cluster:
    Cluster 0: hst nasa mission jpl ___ gov baalke access orbit __
    Cluster 1: space henry nasa access toronto com alaska digex pat sky
    Cluster 2: god com people sandvik keith don jesus article say think
    Cluster 3: graphics com university thanks posting image host nntp computer ac

    一、

    TfidfVectorizer

    HashingVectorizer

    二、

    Two algorithms are demoed: ordinary k-means and its more scalable cousin minibatch k-means

    (To be continued)

  • 相关阅读:
    Vue(小案例_vue+axios仿手机app)_go实现退回上一个路由
    nyoj 635 Oh, my goddess
    nyoj 587 blockhouses
    nyoj 483 Nightmare
    nyoj 592 spiral grid
    nyoj 927 The partial sum problem
    nyoj 523 亡命逃窜
    nyoj 929 密码宝盒
    nyoj 999 师傅又被妖怪抓走了
    nyoj 293 Sticks
  • 原文地址:https://www.cnblogs.com/gui0901/p/4456935.html
Copyright © 2011-2022 走看看