zoukankan      html  css  js  c++  java
  • Kmeans clustering and vector quantization

    K-means clustering and vector quantization (scipy.cluster.vq) — SciPy v0.11 Reference Guide (DRAFT)

    K-means clustering and vector quantization (scipy.cluster.vq)

    Provides routines for k-means clustering, generating code books from k-means models, and quantizing vectors by comparing them with centroids in a code book.

    whiten(obs)Normalize a group of observations on a per feature basis.
    vq(obs, code_book)Assign codes from a code book to observations.
    kmeans(obs, k_or_guess[, iter, thresh])Performs k-means on a set of observation vectors forming k clusters.
    kmeans2(data, k[, iter, thresh, minit, missing])Classify a set of observations into k clusters using the k-means algorithm.

    Background information

    The k-means algorithm takes as input the number of clusters to generate, k, and a set of observation vectors to cluster. It returns a set of centroids, one for each of the k clusters. An observation vector is classified with the cluster number or centroid index of the centroid closest to it.

    A vector v belongs to cluster i if it is closer to centroid i than any other centroids. If v belongs to i, we say centroid i is the dominating centroid of v. The k-means algorithm tries to minimize distortion, which is defined as the sum of the squared distances between each observation vector and its dominating centroid. Each step of the k-means algorithm refines the choices of centroids to reduce distortion. The change in distortion is used as a stopping criterion: when the change is lower than a threshold, the k-means algorithm is not making sufficient progress and terminates. One can also define a maximum number of iterations.

    Since vector quantization is a natural application for k-means, information theory terminology is often used. The centroid index or cluster index is also referred to as a “code” and the table mapping codes to centroids and vice versa is often referred as a “code book”. The result of k-means, a set of centroids, can be used to quantize vectors. Quantization aims to find an encoding of vectors that reduces the expected distortion.

    All routines expect obs to be a M by N array where the rows are the observation vectors. The codebook is a k by N array where the i’th row is the centroid of code word i. The observation vectors and centroids have the same feature dimension.

    As an example, suppose we wish to compress a 24-bit color image (each pixel is represented by one byte for red, one for blue, and one for green) before sending it over the web. By using a smaller 8-bit encoding, we can reduce the amount of data by two thirds. Ideally, the colors for each of the 256 possible 8-bit encoding values should be chosen to minimize distortion of the color. Running k-means with k=256 generates a code book of 256 codes, which fills up all possible 8-bit sequences. Instead of sending a 3-byte value for each pixel, the 8-bit centroid index (or code word) of the dominating centroid is transmitted. The code book is also sent over the wire so each 8-bit code can be translated back to a 24-bit pixel value representation. If the image of interest was of an ocean, we would expect many 24-bit blues to be represented by 8-bit codes. If it was an image of a human face, more flesh tone colors would be represented in the code book.

  • 相关阅读:
    bug-- java.lang.RuntimeException: Type “Klass*"
    ThreadPoolExecutor源码分析二
    ThreadPoolExecutor源码分析一
    java动态代理框架
    liunx 中一个命令可以检测 ps -C java --no-heading| wc -l 一般用于shell脚步编写用
    log4j.properties 使用说明
    图文详解MyEclipse中新建Maven webapp项目的步骤(很详细)
    MySQL高可用性之Keepalived+Mysql(双主热备)
    使用cglib动态创建类,添加方法
    2017年5月5日 星红桉liunx动手实践mysql 主主双机热备
  • 原文地址:https://www.cnblogs.com/lexus/p/2808325.html
Copyright © 2011-2022 走看看