zoukankan      html  css  js  c++  java
  • The Glowing Python: K means clustering with scipy

    The Glowing Python: K- means clustering with scipy

    K- means clustering with scipy

    K-means clustering is a method for finding clusters and cluster centers in a set of unlabeled data. Intuitively, we might think of a cluster as comprising a group of data points whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm alternates the two steps:
    • for each center we identify the subset of training points (its cluster) that is closer to it than any other center;
    • the means of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.
    These two steps are iterated until the centers no longer move or the assignments no longer change. Then, a new point x can be assigned to the cluster of the closest prototype.
    The Scipy library provides a good implementation of the K-Means algorithm. Let's see how to use it:
    from pylab import plot,show
    from numpy import vstack,array
    from numpy.random import rand
    from scipy.cluster.vq import kmeans,vq
    
    # data generation
    data = vstack((rand(150,2) + array([.5,.5]),rand(150,2)))
    
    # computing K-Means with K = 2 (2 clusters)
    centroids,_ = kmeans(data,2)
    # assign each sample to a cluster
    idx,_ = vq(data,centroids)
    
    # some plotting using numpy's logical indexing
    plot(data[idx==0,0],data[idx==0,1],'ob',
         data[idx==1,0],data[idx==1,1],'or')
    plot(centroids[:,0],centroids[:,1],'sg',markersize=8)
    show()
    The result should be as follows:


    In this case we splitted the data in 2 clusters, the blue points have been assigned to the first and the red ones to the second. The squares are the centers of the clusters.
    Let's see try to split the data in 3 clusters:
    # now with K = 3 (3 clusters)
    centroids,_ = kmeans(data,3)
    idx,_ = vq(data,centroids)
    
    plot(data[idx==0,0],data[idx==0,1],'ob',
         data[idx==1,0],data[idx==1,1],'or',
         data[idx==2,0],data[idx==2,1],'og') # third cluster points
    plot(centroids[:,0],centroids[:,1],'sm',markersize=8)
    show()
    This time the the result is as follows:

  • 相关阅读:
    C++程序设计基础(7)位运算
    C++程序设计基础(1)程序的编译和执行
    深度学习看过的文档留存
    Linux常用快捷键
    从Zero到Hero,一文掌握Python关键代码
    三角测量原理与双目视觉景深恢复
    动态规划——DP算法(Dynamic Programing)
    算法-动态规划 Dynamic Programming--从菜鸟到老鸟
    语义分割--全卷积网络FCN详解
    2014-VGG-《Very deep convolutional networks for large-scale image recognition》翻译
  • 原文地址:https://www.cnblogs.com/lexus/p/2808657.html
Copyright © 2011-2022 走看看