Scalars, Vectors, Matrices and Tensors
-
Scalars: A scalar is just a single number, we usually give scalars lower-case variable names.
-
Vector: A vector is an array of numbers, we give vectors lower-case names written in bold typeface, such as x.
[oldsymbol{x} = egin{bmatrix} x_1 \ x_2 \ vdots \ x_n \ end{bmatrix} ]To access (x_1, x_3, x_6) , we define the set S = {1, 3, 6} and write (oldsymbol{x}_s). We use the - sign to index the complement of a set, (oldsymbol{x}_{-1}) is the vector containing all elements of x except of (x_1).
-
Matrices: A matrix is a 2-D array of numbers. We usually give matrices upper-case variable names with bold typeface, such as A.
[mathbf{A} = egin{bmatrix} A_{1, 1} & A_{1, 2} \ A_{2, 1} & A_{2, 2} \ end{bmatrix} ]We can identify all of the numbers with vertical coordinate (i) by writing a ":" for the horizontal coordinate, (oldsymbol{A}_{i, :}) is known as the i-th row of A. Sometimes we may need to index matrix-valued expressions that are not just a single letter, in this case, we use subscripts after the expression, such as (f(oldsymbol{A})_{i, j}), which gives element (i, j) of the matrix computed by applying the function (f) to A.
-
Tensors: In some cases we will need an array with more than two axes. An array of numbers arranged on a regular grid with a variable number of axes is known as a tensor. We denote a tensor named "A" with this typeface: A.
Multiplying Matrices and Vectors
The matrix product of matrices A and B is a third matrix C, C = AB and
Matrix product operations have many properties:
- distributive: A(B + C) = AB + AC
- associative: A(BC) = (AB)C
- not commutative: sometimes AB ( eq) BA
Note that the standsrd product of two matrices is not just a matrix containing the product of the individual elements. Such an operation is called the element-wise product , and is denoted as A (odot) B .
The dot product between two vectors x and y of the same dimensionality is the matrix product (mathbf{x}^Tmathbf{y})
Identity and Inverse Matrices
An identity matrix that preserves n-dimensional vectors is denoted as (oldsymbol{I}_n). Formally, (mathbf{I}_n in mathbb{R}^{n imes n}), and
The matrix inverse of A is donated as (mathbf{A}^{-1}), and it is defined as the matrix such that
Linear Dependence and Span
A linear combination of some set of vectors ({mathbf v^{(1)},dots, mathbf v^{(n)}}) is given by multiplying each vector (mathbf v^{(i)}) by a corresponding scalar coefficient and adding the results:
The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors.
Determining whether (mathbf{Ax = b}) has a solution thus amounts to testing whether b is in the span of the columns of A. This particular span is known as the column space or the range of A.
A set of vectors is linear independent if no vector in the set is a linear combination of the other vectors. A square matrix with linearly dependnet columns is known as singular.
Norms
Sometimes we need to measure the size of a vector. In machine learning, we usually measure the size of vectors using a function called a norm. Formally, the (L^p) norm is given by
for (p in mathbb{R}, pgeq1).
A norm is any function (f) that satisfies the following properties:
- (f(x) = 0 Rightarrow mathbf{x}=mathbf{0})
- (f(mathbf{x} + mathbf{y}) leq f(mathbf{x}) + f(mathbf{y}))
- (forall alpha in mathbb{R}, f(alpha mathbf{x}) = |alpha|f(mathbf{x}))
Special Kinds of Matrices and Vectors
-
Diagonal matrices consist mostly of zeros and have non-zero entries only along the main diagonal. We write diag(v) to denote a square diagonal matrix whose diagonal entries are given by the entries of the vector v. Then we have
[diag(mathbf{v}mathbf{x}) = mathbf{v} odot mathbf{x} ] -
A symmetric matrix is any matrix that is equal to its own transpose:
[mathbf{A} = mathbf{A}^T ] -
A unit vector is a vector with unit norm:
[||mathbf{x}||_2 = 1 ] -
A vector x and a vector y are orthogonal to each other if (mathbf{x}^Tmathbf{y} = 0), if the vectors are not only orthogonal but also have unit norm, we call them orthonormal.
An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal:
[mathbf{A}^Tmathbf{A} = mathbf{A}mathbf{A}^T = mathbf{I} ]
Eigendecomposition
An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v:
Suppose that a matrix A has n linearly independent eigenvectors, ({mathbf{v}^{(1)},dots,mathbf{v}^{(n)}}), withcorresponding eigenvalues ({lambda_1,dots,lambda_n}). We may concatenate all of the eigenvectors to form a matrix V with one eigenvector per column: V = ([mathbf{v}^{(1)},dots,mathbf{v}^{(n)}]). Likewise, we can concatenate the eigenvalues to form a vector λ = ([lambda_1,dots,lambda_n]). The eigendecomposition of A is then given by
Every real symmetric matrix can be decomposed into an expression using only real-valued eigenvectors and eigenvalues:
A matrix whose eigenvalues are all positive is called positive definite. A matrix whose eigenvalues are all positive or zero-valued is called positive semidefinite. If all eigenvalues are negative, the matrix is negative definite.
Singular Value Decomposition
In last section, we saw how to decompose a matrix into eigenvectors and eigenvalues. The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. However, the SVD is more generally applicable. Every real matrix has a singular value decomposition, but the same is not true of the eigenvalue decomposition (matrix may not square).
In singular value decomposition, we can rewrite A as
Suppose that A is an (m imes n) matrix. Then U is defined to be an (m imes m) matrix, D to be an (m imes n) matrix, and V to be an (n imes n) matrix. Each of these matrices is defined to have a special structure. The matrices U and V are both defined to be orthogonal matrices. The matrix D is defined to be a diagonal matrix. Note that D is not necessarily square.
The elements along the diagonal of D are known as the singular values of the matrix A. The columns of U are known as the left-singular vectors. The columns of V are known as as the right-singular vectors.
We can actually interpret the singular value decomposition of A in terms of the eigendecomposition of functions of A. The left-singular vectors of A are the eigenvectors of (oldsymbol{AA}^T). The right-singular vectors of A are the eigenvectors of (oldsymbol{A}^Toldsymbol{A}). The non-zero singular values of A are the square roots of the eigenvalues of (oldsymbol{A}^Toldsymbol{A}). The same is true for (oldsymbol{AA}^T).
The Moore-Penrose Pseudoinverse
Matrix inversion is not defined for matrices that are not square, but the Moore-Penrose pseudoinverse allows us to defined A as
The Trace Operator
The Determinant
The determinant of a square matrix, denoted det(A), is a function mapping matrices to real scalars. The determinant is equal to the product of all the eigenvalues of the matrix.