zoukankan      html  css  js  c++  java
  • 聚类算法

    k-means

    目标是把n个观察对象分成k个聚类,属于每一个聚类的观察对象哟最小的均方差。

    k-means 算法基本步骤

    (1) 从 n个数据对象任意选择 k 个对象作为初始聚类中心;
    (2) 根据每个聚类对象的均值(中心对象),计算每个对象与这些中心对象的距离;并根据最小距离重新对相应对象进行划分;
    (3) 重新计算每个(有变化)聚类的均值(中心对象);
    (4) 计算标准测度函数,当满足一定条件,如函数收敛时,则算法终止;如果条件不满足则回到步骤(2)。

    时间复杂度

    算法的时间复杂度上界为O(nkt), 其中t是迭代次数。

    k-means特点

    1. k个聚类,各聚类本身尽可能的紧凑,而各聚类之间尽可能的分开。
    2. 一般都采用均方差作为标准测度函数

    SLC--Single-linkage clustering

    1. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other.
    2. In single-linkage clustering, the distance between two clusters is determined by a single element pair, namely those two elements (one in each cluster) that are closest to each other. The shortest of these links that remains at any step causes the fusion of the two clusters whose elements are involved. The method is also known as nearest neighbour clustering.
    3. In the beginning of the agglomerative clustering process, each element is in a cluster of its own. The clusters are then sequentially combined into larger clusters, until all elements end up being in the same cluster. At each step, the two clusters separated by the shortest distance are combined. The definition of 'shortest distance' is what differentiates between the different agglomerative clustering methods.

    EM--Expectation–maximization algorithm

    1. In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
    2. The EM algorithm is used to find (local) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly. Typically these models involve latent variables in addition to unknown parameters and known data observations. That is, either missing values exist among the data, or the model can be formulated more simply by assuming the existence of further unobserved data points. For example, a mixture model can be described more simply by assuming that each observed data point has a corresponding unobserved data point, or latent variable, specifying the mixture component to which each data point belongs.

    聚类属性

    不可能定理

  • 相关阅读:
    Axios 各种请求方式传递参数格式
    axios POST提交数据的三种请求方式写法
    Json对象和Json字符串的区别
    ASP.NET Core 使用 AutoFac 注入 DbContext
    asp.net core signalr Error: Failed to start the transport 'WebSockets': null
    js中settimeout和setinterval的区别是什么?
    VS IDE开发字体推荐
    .net core ef core 自动迁移,自动修改数据库
    localstorage和sessionstorage的区别
    TinyOS编程
  • 原文地址:https://www.cnblogs.com/james0/p/7991164.html
Copyright © 2011-2022 走看看