zoukankan      html  css  js  c++  java
  • Whitening

    The goal of whitening is to make the input less redundant; more formally, our desiderata are that our learning algorithms sees a training input where (i) the features are less correlated with each other, and (ii) the features all have the same variance.

    example

    How can we make our input features uncorrelated with each other? We had already done this when computing 	extstyle x_{
m rot}^{(i)} = U^Tx^{(i)}. Repeating our previous figure, our plot for 	extstyle x_{
m rot} was:

    PCA-rotated.png

    The covariance matrix of this data is given by:

    egin{align}
egin{bmatrix}
7.29 & 0  \
0 & 0.69
end{bmatrix}.
end{align}

    It is no accident that the diagonal values are 	extstyle lambda_1 and 	extstyle lambda_2. Further, the off-diagonal entries are zero; thus, 	extstyle x_{{
m rot},1} and 	extstyle x_{{
m rot},2}are uncorrelated, satisfying one of our desiderata for whitened data (that the features be less correlated).

    To make each of our input features have unit variance, we can simply rescale each feature 	extstyle x_{{
m rot},i} by 	extstyle 1/sqrt{lambda_i}. Concretely, we define our whitened data 	extstyle x_{{
m PCAwhite}} in Re^n as follows:

    egin{align}
x_{{
m PCAwhite},i} = frac{x_{{
m rot},i} }{sqrt{lambda_i}}.   
end{align}

    Plotting 	extstyle x_{{
m PCAwhite}}, we get:

    PCA-whitened.png

    This data now has covariance equal to the identity matrix 	extstyle I. We say that 	extstyle x_{{
m PCAwhite}} is our PCA whitened version of the data: The different components of 	extstyle x_{{
m PCAwhite}} are uncorrelated and have unit variance.

     

    ZCA Whitening

    Finally, it turns out that this way of getting the data to have covariance identity 	extstyle I isn't unique. Concretely, if 	extstyle R is any orthogonal matrix, so that it satisfies 	extstyle RR^T = R^TR = I (less formally, if 	extstyle R is a rotation/reflection matrix), then 	extstyle R \,x_{
m PCAwhite} will also have identity covariance. In ZCA whitening, we choose 	extstyle R = U. We define

    egin{align}
x_{
m ZCAwhite} = U x_{
m PCAwhite}
end{align}

    Plotting 	extstyle x_{
m ZCAwhite}, we get:

    ZCA-whitened.png

    It can be shown that out of all possible choices for 	extstyle R, this choice of rotation causes 	extstyle x_{
m ZCAwhite} to be as close as possible to the original input data 	extstyle x.

    When using ZCA whitening (unlike PCA whitening), we usually keep all 	extstyle n dimensions of the data, and do not try to reduce its dimension.

     

    Regularizaton

    When implementing PCA whitening or ZCA whitening in practice, sometimes some of the eigenvalues 	extstyle lambda_i will be numerically close to 0, and thus the scaling step where we divide by sqrt{lambda_i} would involve dividing by a value close to zero; this may cause the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we therefore implement this scaling step using a small amount of regularization, and add a small constant 	extstyle epsilon to the eigenvalues before taking their square root and inverse:

    egin{align}
x_{{
m PCAwhite},i} = frac{x_{{
m rot},i} }{sqrt{lambda_i + epsilon}}.
end{align}

    When 	extstyle x takes values around 	extstyle [-1,1], a value of 	extstyle epsilon approx 10^{-5} might be typical.

    For the case of images, adding 	extstyle epsilon here also has the effect of slightly smoothing (or low-pass filtering) the input image. This also has a desirable effect of removing aliasing artifacts caused by the way pixels are laid out in an image, and can improve the features learned (details are beyond the scope of these notes).

    ZCA whitening is a form of pre-processing of the data that maps it from 	extstyle x to 	extstyle x_{
m ZCAwhite}. It turns out that this is also a rough model of how the biological eye (the retina) processes images. Specifically, as your eye perceives images, most adjacent "pixels" in your eye will perceive very similar values, since adjacent parts of an image tend to be highly correlated in intensity. It is thus wasteful for your eye to have to transmit every pixel separately (via your optic nerve) to your brain. Instead, your retina performs a decorrelation operation (this is done via retinal neurons that compute a function called "on center, off surround/off center, on surround") which is similar to that performed by ZCA. This results in a less redundant representation of the input image, which is then transmitted to your brain.

  • 相关阅读:
    字符编码总结
    文件操作总结(2)
    Codeforces Round #432 (Div. 2, based on IndiaHacks Final Round 2017) C. Five Dimensional Points 暴力
    UVA 10048
    UVA 247
    UVA 1151
    UVA 1395
    Codeforces Round #260 (Div. 1) D. Serega and Fun 分块
    Codeforces Beta Round #13 E. Holes 分块
    Codeforces Round #404 (Div. 2) E. Anton and Permutation 分块
  • 原文地址:https://www.cnblogs.com/sprint1989/p/3971244.html
Copyright © 2011-2022 走看看