zoukankan      html  css  js  c++  java
  • K-means algorithm----PRML读书笔记

        The K-means algorithm is based on the use of squared Euclidean distance as the measure of  dissimilarity between a data point and a prototype vector. Our goal is to partition the data set into some number K of clusters, where we shall suppose for the moment that the value of K is given. We can then define an objective function, sometimes called a distortion measure, given by J=ΣnΣkrnk||xnk||2,where n=1,...N, k=1,...,K, N is observations of a random D-dimensional Euclidean variable x, K is number of clusters. J represents the sum of the squares of the distances of each data point to its assigned vector μk. We can think of the μk as representing the centres of the clusters. Our goal is to find values for the {rnk} and the {μk} so as to minimize J. First we choose some initial values for the μk. Then in the first phase we minimize J with respect to the rnk, keeping the μk fixed. In the second phase we minimize J with respect to μk, keeping rnk fixed. This two-stage optimization is then repeated until convergence. We simply assign the nth data point to the closest cluster centre, this can be expressed as rnk=1,if k=argminj||xnj||2, otherwise rnk=0. The objective function J is a quadratic function of μk, and it can be minimized by setting its derivative with respect to μk to zero giving 2Σnrnk(xnk)=0. μk=(Σnrnkxn)/(Σnrnk), this result has a simple  interpretation, namely set μk equal to the mean of all of the data points xn assigned to cluster k. For this reason, the procedure is known as the K-means algorithm.

  • 相关阅读:
    composer npm bower 版本依赖符号说明
    FastAdmin 速极后台框架从 v1.0 到 v1.2 的数据库升级
    FastAdmin 也可以出书了
    FastAdmin 开发时用到的 Git 命令 (2020-09-26)
    FastAdmin用什么弹窗组件
    笔记:Linux 文件权限
    笔记:使用源代码在 Centos 7 安装 Git 2
    php gd 生成表格 图片
    easyui datagrid 清空
    mysql 去重
  • 原文地址:https://www.cnblogs.com/donggongdechen/p/9789561.html
Copyright © 2011-2022 走看看