To summarize, principal component analysis involves evaluating the mean x and the covariance matrix S
of the data set and then finding the M eigenvectors of S corresponding to the M largest eigenvalues. If we
plan to project our data onto the first M principal compents, then we only need to find the first M eigenvalues
and eigenvectors.
PCA can be defined as the orthogonal projection of the data onto a lower dimensional linear space, known as
the principal subspace, such that the variance of the projected data is maximized. Equivalently, it can be defined
as the linear projection that minimizes the average projection cost, defined as the mean squared distance between
the data points and their projections.
Consider a data set of observations {xn} where n = 1,...,N, and xn is a Euclidean variable with dimensionality D.
Our goal is to project the data onto a space having dimensionality M < D while maximizing the variance of the projected
data.
The general solution to the minimization of J for arbitrary D and arbitrary M < D is obtained by choosing the {ui} to be
eigenvectors of the covariance matrix given by Sui=λiui. where i=1,...,D, and as usual the eigenvectors {ui} are chosen to
be orthonormal.