zoukankan      html  css  js  c++  java
  • unsupervised learning -- K MEANS

    Altough it sounds quiet like KNN algorithm, however, KNN is a kind of classification algorithm of supervised learning while K MEANS is a kind of unsupervised learning algorithm. 

    K MEANS as a cluster method, can figure out k classes from the given dataset without labels, in which the class number k is given by user. 

    The procedure of K MEANS algorithm is:

    1. initial the centroids with radom points in dataset, which represent k classes
    2. calculate the others label based on these k classes through the minimum distence from the centroids
    3. recalcute the centroids based on the labels we calculated in the 2nd step
    4. repeat until the iterations ends

    And here is the procedure of the naive K MEANS algorithm:

     

    we can use K MEANS algorithm simply from sklearn:

    from sklearn.cluster import KMeans
    Kmean = KMeans(n_clusters=2)
    Kmean.fit(X)

    And here is a more explicit code

    import numpy as np
    from numpy.linalg import norm
    
    
    class Kmeans:
        '''Implementing Kmeans algorithm.'''
    
        def __init__(self, n_clusters, max_iter=100, random_state=123):
            self.n_clusters = n_clusters
            self.max_iter = max_iter
            self.random_state = random_state
    
        def initializ_centroids(self, X):
            np.random.RandomState(self.random_state)
            random_idx = np.random.permutation(X.shape[0])
            centroids = X[random_idx[:self.n_clusters]]
            return centroids
    
        def compute_centroids(self, X, labels):
            centroids = np.zeros((self.n_clusters, X.shape[1]))
            for k in range(self.n_clusters):
                centroids[k, :] = np.mean(X[labels == k, :], axis=0)
            return centroids
    
        def compute_distance(self, X, centroids):
            distance = np.zeros((X.shape[0], self.n_clusters))
            for k in range(self.n_clusters):
                row_norm = norm(X - centroids[k, :], axis=1)
                distance[:, k] = np.square(row_norm)
            return distance
    
        def find_closest_cluster(self, distance):
            return np.argmin(distance, axis=1)
    
        def compute_sse(self, X, labels, centroids):
            distance = np.zeros(X.shape[0])
            for k in range(self.n_clusters):
                distance[labels == k] = norm(X[labels == k] - centroids[k], axis=1)
            return np.sum(np.square(distance))
        
        def fit(self, X):
            self.centroids = self.initializ_centroids(X)
            for i in range(self.max_iter):
                old_centroids = self.centroids
                distance = self.compute_distance(X, old_centroids)
                self.labels = self.find_closest_cluster(distance)
                self.centroids = self.compute_centroids(X, self.labels)
                if np.all(old_centroids == self.centroids):
                    break
            self.error = self.compute_sse(X, self.labels, self.centroids)
        
        def predict(self, X):
            distance = self.compute_distance(X, old_centroids)
            return self.find_closest_cluster(distance)

    ref:https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-methods-and-drawbacks-aa03e644b48a 

  • 相关阅读:
    自己常用网站记录
    css弹性布局指定显示行数多余文字去掉用省略号代替以及弹性布局中css 卡片阴影效果
    微信小程序页面传参被截取问题
    阴影效果 css3 为什么要加 -moz-box-shadow -webkit-box-shadow -o-box-shadow,直接用box-shadow不是都能识别吗?
    css常用清除浮动方式
    什么是微信小程序云开发 它的作用是什么
    JMeter压测“java.net.SocketException: Socket closed”解决方法
    Jmeter压力测试工具安装及使用教程
    OnActionExecuting和OnActionExecuted执行顺序
    C#循环下载多个文件(把多个文件压缩成一个文件可以一次性下载)
  • 原文地址:https://www.cnblogs.com/yuelien/p/13883113.html
Copyright © 2011-2022 走看看