zoukankan      html  css  js  c++  java
  • PP: Deep clustering based on a mixture of autoencoders

    Problem: clustering

    A clustering network transforms the data into another space and then selects one of the clusters. Next, the autoencoder associated with this cluster is used to reconstruct the data-point.

    Introduction:

    traditional method: data------> extract a feature vector from each object --------> aggregate groups of vectors in a feature space. 

    cluster is represented by an autoencoder network. ??how

    common method: k-means; but for the high-dimensional dataset, it's less useful because inter-point distances become less informative in high-dimensional spaces. 

    如果对于找一个序列的pattern来说,是不是就是时间维度作为高维情况,每个pattern作为一个cluster,而有的子序列不能归到cluster当中。

    representation learning has been used to map the input data into a low-dimensional feature space.

    Attempts: apply unsupervised deep learning approaches for clustering.  ??how

    However, most focus on clustering over a low-dimensional feature space. 

    Transform the data into more clustering-friendly representations:

    A deep version of k-means is based on learning a data representation and applying k-means in the embedded space.

    How to represent a cluster: 

    a vector VS an autoencoder network.

    Data collapsing problem: 数据崩溃问题,对于每个数据库,你必须重新调一遍程序。

    for multivariate time series, how to find patterns. 

    1. find patterns: SAX; TICC; slide windows; 导数

    2. VG, statistic features.

    3.   

    Supplementary knowledge: 

    1. Pattern recognition and clustering

    Pattern recognition is a mature field in computer science with well-established techniques for the assignment of unknown patterns to categories, or classes. A pattern is defined as a vector of some number of measurements, called features. Usually, a pattern recognition system uses training samples from known categories to form a decision rule for unknown patterns. The unknown pattern is assigned to one of the categories according to the decision rule. Since we are interested in the classes of documents that have been assigned by the user, we can use pattern recognition techniques to try to classify previously unseen documents into the user's categories. While pattern recognition techniques require that the number and labels of categories are known, clustering techniques are unsupervised, requiring no external knowledge of categories. Clustering methods simply try to group similar patterns into clusters whose members are more similar to each other (according to some distance measure) than to members of other clusters. There is no a priori knowledge of patterns that belong to certain groups, or even how many groups are appropriate. Refer to basic pattern recognition and clustering texts such as [567] for further information. 

    We first employ pattern recognition techniques on documents to attempt to find features for classification, then focus on clustering the raw features of the documents.

  • 相关阅读:
    程序员的健康问题
    比特币解密
    浅谈比特币
    一款能帮助程序员发现问题的软件
    微软为什么总招人黑?
    写了一个bug,最后却变成了feature,要不要修呢?
    不管你信不信,反正我信了
    Excel工作表密码保护的破解
    pip笔记(译)
    super
  • 原文地址:https://www.cnblogs.com/dulun/p/12309740.html
Copyright © 2011-2022 走看看