zoukankan html css js c++ java

Clustering text documents using k-means

源代码的链接为http://scikit-learn.org/stable/auto_examples/text/document_clustering.html

Loading 20 newsgroups dataset for categories:
['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
3387 documents
4 categories

Extracting features from the training dataset using a sparse vectorizer
done in 2.980000s
n_samples: 3387, n_features: 10000

Clustering sparse data with MiniBatchKMeans(batch_size=1000, compute_labels=True, init='k-means++',
        init_size=1000, max_iter=100, max_no_improvement=10, n_clusters=4,
        n_init=1, random_state=None, reassignment_ratio=0.01, tol=0.0,
        verbose=False)
done in 0.514s

Homogeneity: 0.506
Completeness: 0.576
V-measure: 0.539
Adjusted Rand-Index: 0.477
Silhouette Coefficient: 0.006

Top terms per cluster:
Cluster 0: hst nasa mission jpl ___ gov baalke access orbit __
Cluster 1: space henry nasa access toronto com alaska digex pat sky
Cluster 2: god com people sandvik keith don jesus article say think
Cluster 3: graphics com university thanks posting image host nntp computer ac

一、

TfidfVectorizer

HashingVectorizer

二、

Two algorithms are demoed: ordinary k-means and its more scalable cousin minibatch k-means

(To be continued)

查看全文

相关阅读:
架构设计
 git 常用命令
 C# 加载C++的dll
windows 服务部署管理
 wpf 模板绑定控件属性
 golang开启module模式 go mod
使用docker安装redis
使用docker安装elasticsearch
使用docker安装etcd
使用docker安装mysql5.7

原文地址：https://www.cnblogs.com/gui0901/p/4456935.html