zoukankan      html  css  js  c++  java
  • kmeans k均值聚类的弱点/缺点

    Similar to other algorithm, K-mean clustering has many weaknesses:

    1 When the numbers of data are not so many, initial grouping will determine the cluster significantly.  当数据数量不是足够大时,初始化分组很大程度上决定了聚类,影响聚类结果。
    2 The number of cluster, K, must be determined before hand.  要事先指定K的值。
    3 We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few. 数据数量不多时,输入的数据的顺序不同会导致结果不同。
    4 Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum. 对初始化条件敏感。
    5 We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. 无法确定哪个属性对聚类的贡献更大。
    6 weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one. 使用算术平均值对outlier不鲁棒。
    7 The result is circular cluster shape because based on distance.  因为基于距离,故结果是圆形的聚类形状。

    One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean.  克服缺点的方法: 使用尽量多的数据;使用中位数代替均值来克服outlier的问题。

    Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true! See how you can use multivariate data up to n dimensions (even mixed data type) here. The key to use other type of dissimilarity is in the distance matrix.

    http://people.revoledu.com/kardi/tutorial/kMean/Weakness.htm

  • 相关阅读:
    Linux Core Dump
    ODP.NET Managed正式推出
    获取EditText的光标位置
    (Java实现) 洛谷 P1603 斯诺登的密码
    (Java实现) 洛谷 P1603 斯诺登的密码
    (Java实现) 洛谷 P1036 选数
    (Java实现) 洛谷 P1036 选数
    (Java实现) 洛谷 P1012 拼数
    (Java实现) 洛谷 P1012 拼数
    (Java实现) 洛谷 P1028 数的计算
  • 原文地址:https://www.cnblogs.com/emanlee/p/2381617.html
Copyright © 2011-2022 走看看