Whitening - 走看看

zoukankan html css js c++ java

Whitening

The goal of whitening is to make the input less redundant; more formally, our desiderata are that our learning algorithms sees a training input where (i) the features are less correlated with each other, and (ii) the features all have the same variance.

example

How can we make our input features uncorrelated with each other? We had already done this when computing $extstyle x_{ m rot}^{(i)} = U^Tx^{(i)}$ . Repeating our previous figure, our plot for $extstyle x_{ m rot}$ was:

The covariance matrix of this data is given by:

$egin{align} egin{bmatrix} 7.29 & 0 \ 0 & 0.69 end{bmatrix}. end{align}$

It is no accident that the diagonal values are $extstyle lambda_1$ and $extstyle lambda_2$ . Further, the off-diagonal entries are zero; thus, $extstyle x_{{ m rot},1}$ and $extstyle x_{{ m rot},2}$ are uncorrelated, satisfying one of our desiderata for whitened data (that the features be less correlated).

To make each of our input features have unit variance, we can simply rescale each feature $extstyle x_{{ m rot},i}$ by $extstyle 1/sqrt{lambda_i}$ . Concretely, we define our whitened data $extstyle x_{{ m PCAwhite}} in Re^n$ as follows:

$egin{align} x_{{ m PCAwhite},i} = frac{x_{{ m rot},i} }{sqrt{lambda_i}}. end{align}$

Plotting $extstyle x_{{ m PCAwhite}}$ , we get:

This data now has covariance equal to the identity matrix $extstyle I$ . We say that $extstyle x_{{ m PCAwhite}}$ is our PCA whitened version of the data: The different components of $extstyle x_{{ m PCAwhite}}$ are uncorrelated and have unit variance.

ZCA Whitening

Finally, it turns out that this way of getting the data to have covariance identity $extstyle I$ isn't unique. Concretely, if $extstyle R$ is any orthogonal matrix, so that it satisfies $extstyle RR^T = R^TR = I$ (less formally, if $extstyle R$ is a rotation/reflection matrix), then $extstyle R \,x_{ m PCAwhite}$ will also have identity covariance. In ZCA whitening, we choose $extstyle R = U$ . We define

$egin{align} x_{ m ZCAwhite} = U x_{ m PCAwhite} end{align}$

Plotting $extstyle x_{ m ZCAwhite}$ , we get:

It can be shown that out of all possible choices for $extstyle R$ , this choice of rotation causes $extstyle x_{ m ZCAwhite}$ to be as close as possible to the original input data $extstyle x$ .

When using ZCA whitening (unlike PCA whitening), we usually keep all $extstyle n$ dimensions of the data, and do not try to reduce its dimension.

Regularizaton

When implementing PCA whitening or ZCA whitening in practice, sometimes some of the eigenvalues $extstyle lambda_i$ will be numerically close to 0, and thus the scaling step where we divide by $sqrt{lambda_i}$ would involve dividing by a value close to zero; this may cause the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we therefore implement this scaling step using a small amount of regularization, and add a small constant $extstyle epsilon$ to the eigenvalues before taking their square root and inverse:

$egin{align} x_{{ m PCAwhite},i} = frac{x_{{ m rot},i} }{sqrt{lambda_i + epsilon}}. end{align}$

When $extstyle x$ takes values around $extstyle [-1,1]$ , a value of $extstyle epsilon approx 10^{-5}$ might be typical.

For the case of images, adding $extstyle epsilon$ here also has the effect of slightly smoothing (or low-pass filtering) the input image. This also has a desirable effect of removing aliasing artifacts caused by the way pixels are laid out in an image, and can improve the features learned (details are beyond the scope of these notes).

ZCA whitening is a form of pre-processing of the data that maps it from $extstyle x$ to $extstyle x_{ m ZCAwhite}$ . It turns out that this is also a rough model of how the biological eye (the retina) processes images. Specifically, as your eye perceives images, most adjacent "pixels" in your eye will perceive very similar values, since adjacent parts of an image tend to be highly correlated in intensity. It is thus wasteful for your eye to have to transmit every pixel separately (via your optic nerve) to your brain. Instead, your retina performs a decorrelation operation (this is done via retinal neurons that compute a function called "on center, off surround/off center, on surround") which is similar to that performed by ZCA. This results in a less redundant representation of the input image, which is then transmitted to your brain.

查看全文

相关阅读:
十个能让你成为牛逼前端程序猿的特征
 一道Javascript面试题引发的血案
 程序员实现财务自由的9个阶段，你达到了哪一段？
程序员进阶路上不能错过的史上最全技术知识图谱秘籍
 清华大学研发神技能：用意念回复微信
 机器学习原来如此有趣：用深度学习识别人脸
 【代码片段】如何使用CSS来快速定义多彩光标
 Android自定义一款带进度条的精美按键
 现在的人工智能逆天到什么地步了？
分享几套生成iMac相关高逼格免费mockup的素材和在线工具

原文地址：https://www.cnblogs.com/sprint1989/p/3971244.html

Copyright © 2011-2022 走看看