zoukankan      html  css  js  c++  java
  • kmeans k均值聚类的弱点/缺点

    Similar to other algorithm, K-mean clustering has many weaknesses:

    1 When the numbers of data are not so many, initial grouping will determine the cluster significantly.  当数据数量不是足够大时,初始化分组很大程度上决定了聚类,影响聚类结果。
    2 The number of cluster, K, must be determined before hand.  要事先指定K的值。
    3 We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few. 数据数量不多时,输入的数据的顺序不同会导致结果不同。
    4 Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum. 对初始化条件敏感。
    5 We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. 无法确定哪个属性对聚类的贡献更大。
    6 weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one. 使用算术平均值对outlier不鲁棒。
    7 The result is circular cluster shape because based on distance.  因为基于距离,故结果是圆形的聚类形状。

    One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean.  克服缺点的方法: 使用尽量多的数据;使用中位数代替均值来克服outlier的问题。

    Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true! See how you can use multivariate data up to n dimensions (even mixed data type) here. The key to use other type of dissimilarity is in the distance matrix.

    http://people.revoledu.com/kardi/tutorial/kMean/Weakness.htm

  • 相关阅读:
    表单校验神器
    插入排序
    数组去重的几种常使用的方式
    day44 mysql高级部分内容
    day43 多表查询和pymysql
    day42 字段的增删改查详细操作
    day41 mysql详细操作
    day40 mysql数据类型
    day39 mysql数据库基本操作
    day37 异步回调和协程
  • 原文地址:https://www.cnblogs.com/emanlee/p/2381617.html
Copyright © 2011-2022 走看看