1. Image formation
1.1 针孔照相机
1.2 投影属性
多对一
点对点
线对线
平面对平面
焦平面:过第一焦点(前焦点或物方焦点)且垂直于系统主光轴的平面称第一焦平面
1.3 透视投影
同理可得x/x'=z/f'
1.4 透视投影矩阵
因为三维图像在二维面上投影,所以不需要z坐标
1.5 坐标系统
世界坐标系是客观存在的,标准的右手坐标系,中指z轴,大拇指x轴,食指y轴。
相比之下相机坐标系可以任意变形。
1.6 相机旋转和平移
X^W是什么齐次世界坐标 X^W有波浪线的是非齐次世界坐标
1.7 坐标系统之Extrinsics
1.8 相机坐标系统
主轴:从相机中心(C点)出发,垂直于图像平面的线。
规范化的坐标系统:相机中心是原始的,主轴是z轴。
主点:主轴和平面相交的点(P点)。
1.9 主点偏移
1.10 回顾:针孔相机模型
X^I是图像坐标。
和X^C是相机坐标
K是标定矩阵
1.10 像素坐标
1.11 标定矩阵
1.12 坐标系统之Inrinsics
1.13 坐标系统总结
1.14 正交投影
因为z=z'=f'
1.15 相机标定
给定n个点的Xi和xi坐标,判断相机参数。
点积: (x1 , y1 , z1 ) .( x2 , y2 , z2 ) = x1x2 + y1y2 + z1z2
叉乘: ( x1 , y1 , z1 ) X ( x2 , y2 , z2 ) =( y1z2 - z1y2 , z1x2 - x1z2 , x1y2 - y1x2 )
p1^T是1×4矩阵,P矩阵是3×4矩阵,Xi是齐次坐标4×1矩阵
p1是4×1矩阵
1.16 齐次最小平方
2. Cameras filtering
2.1 Shrinking the aperture(光圈)
更少的光线;衍射效果。
2.2 添加透镜
特殊的点发出的光线在焦距上,其他的点投影到散光圈上了。
Thin lens formula:
运用两次相似三角形可推出。
上张图两式联立即可推出。
2.3 景深
改变光圈大小来改变景深。
2.4 视野
视野(Field of View)依赖于焦距和camera retina
色差:
透镜对不等的波长有不同的折射率->导致彩色波纹
球面像差:
离光轴越远的光线聚焦地更近。
这样会导致图像中心边缘模糊。
光晕/光线损失
Radial Distortion(径向畸变):
理想化的空间采样:
2.5 CCD和CMOS的区别:
CCD从像素移动照片生成的电荷到像素并且将其转成电压在一个输出结点上;
CMOS成像器将电荷在每个像素内转换成电压。
Slides36第一行是CCD吗
颜色感知:
2.6 彩色摩尔纹的成因
如果感光元件CCD(CMOS)像素的空间频率与影像中条纹的空间频率接近,就会产生摩尔纹
2.7 数字图像处理
2.8 图像可看作函数
[a,b]×[c,d]表示笛卡尔乘积
最后映射成[0,m]^3
2.9 Filter
线性运算:
卷积:
卷积的计算规则:
相加噪声模型:
Average Filter:
Gaussian Averaging(高斯平滑):
实现:
Write the kernels as [f(0,0), f(0,1), f(0,2)] and [h(0,0), h(1,0), h(2,0)]T and compute the convolution. Note that in this case the definition breaks down to g(i,j) = f(0,j)h(i,0).
3.边缘检测
3.1 Binary Images(二进制图像)
3.2 Morphological filters — Side note
3.3 Gaussian pyramids
3.4 Subsampling(降采样)
3.5 Aliasing
3.6 What are “edges” (1D)
3.7 Edges
找到导数值很大的地方
3.8 Edges & Derivatives
找到一阶导数取得极值的地方,即二阶导的零点。因为一阶导数极值点(二阶导数的零点)是像素变化最剧烈的地方。什么地方变化最剧烈?是边缘。
3.9 Compute Derivatives
1 -1和1 0 -1是f(x+h),f(x),f(x-h)的系数。
3.10 Edge Detection
based on 1st derivative:
Simplification:
3.11 Image scanline
3.12 Implementing 1D edge detection
3.13 Extension to 2D Edge Detection: Partial Derivatives
Sobel filters (Gaussian smoothing in opposite direction)
3.14 Again: Derivatives and Smoothing
3.15 What is the gradient?
梯度方向垂直于边缘,梯度幅值描述了边缘强度
3.16 2D Edge Detection
滤波的维度会影响导数估计和边缘语义的恢复
3.17 ‘Optimal’ Edge Detection: Canny
3.18 The Canny edge detector
3.19 Non-maximum suppression
找到本地最大值(loacal maximum),即是“真正的边缘”
3.20 Threshhold的好处
3.21 Hysteresis
标记如下像素作为边缘
*1 使梯度幅度超过第二个较低的阈值
*2 连接到高于较高阈值的像素
3.22 Edges & Derivatives…
3.23 Compute 2nd order derivatives
3.24 The Laplacian
3.25 Second Derivative of Gaussian
3.26 1D edge detection
3.27 Approximating the Laplacian
3.28 Edge Detection with Laplacian
3.29 Laplacian pyramid
4. PCA
4.1 Images as Vectors
4.2 Images as Points
n*m可以理解为二维灰度图行列合并成一列。
4.3 Template Matching
4.4 Dot Product
4.5 SSD(Sum of Squared Differences) Matching
4.6 Subspace Methods
4.7 Linear Dimensionality Reduction
4.8 Goal
4.9 Principal Component Analysis
4.10 Decomposition
||||表示Frobenius 范数。参考:(https://www.cnblogs.com/lpgit/p/9734701.html#:~:text=Frobenius 范数,简称F,记为||·||F。&text=可用于利用低秩,数尽可能地小。)
4.11 Minimizing the Error
( ilde{x})是测量值
使误差最小相当于使测量值的方差最大。
解释如下:
Along the principal components, which are an orthonormal eigenbasis of the covariance matrix, the variance is equal to the corresponding eigenvalue of the covariance matrix.
So, the total variance of the PCA projections with dimension k is equal to the trace (sum of eigenvalues) of the covariance matrix of the PCA projections, which, if you pick the right principle components according to the highest eigenvalues, equals the sum of the top k eigenvalues of the original covariance matrix.
简单地说,PCA投影的方差和该矩阵的迹(对角线数之和)相等。
(ar{x})是0
4.12 Formulation of problem
4.13 Mean & Variance
注意这里的协方差矩阵
4.14 Maximizing variance
拉格朗日乘数法求最值
why is the largest eigenvalue = maximal variance?
4.15 Reminder:Eigendecomposition
4.16 Principal Component Analysis
4.17 Choosing D
4.18 Rewrite PCA
4.19 Singular Value Decomposition
4.20 How to use SVD for PCA
4.21 EigenFaces
4.22 Face Recognition
4.23 Intra- & Extra-Personal Subspaces
4.24 EigenFeatures
4.25 Naïve View-Based Approach
4.26 View-Based Approach
4.27 Mouth Space
4.28 Discriminative enough?
4.29 Mouth Space
4.30 Simple Search Strategy
5. multivariate
5.1 Statistics Review: Univariate
方差和样本方差
5.2 Statistics Review: Multivariate
协方差定义为
推导过程如下
5.3 Statistical Correlation
5.4 Reminder: Covariance Matrix
5.5 Recall: Outer Product
5.6 Correlated?
5.7 Recall: Basis Representations
在数学中,克罗内克函数(又称克罗内克δ函数、克罗内克δ)δ(_{ij}) 是一个二元函数,得名于德国数学家利奥波德·克罗内克。克罗内克函数的自变量(输入值)一般是两个整数,如果两者相等,则其输出值为1,否则为0。
ajuj是从aiui的求和中提取出来的。
6. interest points
6.1 Finding Corresponding Points
6.2 Interest point detection
6.3 Interest point detection: Derivation
6.4 Can we approximate this?
6.5 Interest point detection: Derivation
u和v分别为向右和向上的平移量。
6.6 The Harris operator
(lambda_{1})和(lambda_{2})是二阶矩矩阵化简后的对角线上的元素。
6.7 Harris detector example
6.8 f value (red high, blue low)
6.9 Threshold (f > value)
6.10 Harris interest points
6.11 Other interest point detectors
6.12 Hessian interest points
6.13 Geometric Transformations
6.14 Rotation Invariance
6.15 Automatic scale selection
6.16 Scale Space
6.17 Automatic scale selection
6.18 Scale invariance – Normalization
6.19 Laplacian of Gaussian (LoG)
6.20 Harris-Laplace (HarLap)
6.21 Other popular interest points
7. Matching single view
7.1 Evaluation
7.2 Local Descriptors / Features
7.3 Local Descriptors
- Distinctiveness:
- Invariance:
- Robustness:
总结:
7.4 Scale Invariant Feature Transform (SIFT)
7.5 SIFT Descriptor
7.6 Properties of SIFT
7.7 Shape Context
7.8 Evaluating the results
7.9 True/false positives
7.10 Evaluating the results
7.11 Projecting a planar object
7.12 Geometric intuition
7.13 Projecting a plane
7.14 Rotating camera
7.15 Estimating the Homography
2N×9矩阵和9×1矩阵推导
https://blog.csdn.net/lyhbkz/article/details/82254893
对u进行SVD,即u = (U Sigma V^T), (V^T)即是(ar{H})
7.16 Robust Estimation
8. two view
8.1 Robust Estimation
表格白色中的数值代表迭代次数
8.2 Estimating the homography
8.3 Application: AutoStitch
8.4 Next goal: Recovery of 3D structure
8.5 What is Stereo (Vision)?
8.6 Depth from Stereo
8.7 Triangulation
8.8 Triangulation: Geometric midpoint
8.9 Triangulation: Linear approach
8.10 Side note
8.11 Triangulation: non-linear approach
8.12 Epipolar Geometry
8.13 Example: Converging Cameras
8.14 Example: Motion Parallel to Image Plane
8.13 Epipolar Geometry
t是平移矩阵,R是旋转矩阵。
(O_{1}O_{2})是t,表示(O_1)经过平移成为(O_2)
(O_{2}p_{2})是(Rp_{2})相当于(O_{1}p_{2})旋转后成为(O_{2}p_{2})
9. stereo
9.1 Binocular Stereo: Parallel cameras
t( imes)R的结果推导:
计算叉乘矩阵:
9.2 Uncalibrated Cameras
9.3 Fundamental matrix
9.4 Estimating the fundamental matrix
9.5 Eight-point algorithm
9.6 Estimating the fundamental matrix
9.7 Example
9.8 What can we do with 2 views?
9.9 Stereo Vision — Easier problem
9.10 Binocular Stereo
9.11 Binocular Stereo — Disparity
9.12 Triangulation
x是左边O的坐标系
(x^prime)是右边(O^prime)的坐标系
9.13 Stereo rectification
9.14 Stereo Correspondence
9.15 Correspondence problem
9.16 Normalized Correlation
9.17 Even simpler: Sum of Squared (Pixel) Differences
9.18 Window-based matching
9.19 Convert disparity to depth
9.20 Window-based matching
9.21 Influence of window size
9.22 The similarity constraint
9.23 Limitations of similarity constraint
10. Motion
10.1 Non-local constraints
10.2 Scanline stereo
10.3 "Shortest path" for scanline stereo
10.4 Coherent stereo on 2D grid
10.5 Motion Field
10.6 Optical Flow
10.7 Optical Flow Field
10.8 Optical Flow Estimation
10.9 How do we compute Optical Flow?
10.10 Brightness Constancy
10.11 Spatial Coherence
10.12 Minimize Brightness Difference
10.13 Sum of Squared Differences
10.14 Simple Flow Estimation Algorithm
10.15 Can we approximate this?
10.16 Optical Flow Constraint Equation
10.17 Notation
10.18 OFCE
10.19 Aperture Problem
10.20 Multiple Constraints
10.21 Area-Based Flow
10.22 Optimization
10.23 Structure tensor
10.24 Solving for u
u是((u,v)^{t})向量
10.25 SSD Surface - Textured area
10.26 SSD Surface - Single Edge
10.27 SSD Surface - Surface Boundary
11. Recognition
11.1 Image registration
11.2 Image Warping
11.3 Forward Warping
11.4 Inverse Warping
11.5 Interpolation
11.6 Warping
11.7 Image registration revisited
11.8 Dense LK Flow
11.9 Iterative Estimation
11.10 Iterative Estimation
11.11 Coarse-to-fine Estimation
11.12 Is that a good result?
11.13 Aside: How to get the ground truth?
11.14 What is the problem?
11.15 Shift Gears: Object Recognition
11.16 Example of Recognition & Localization
11.17 Recognition problems
11.18 Search and Recognition
11.19 Naive View-Based Approach
11.20 From faces and mouths to objects
11.21 Appearance-Based Instance Recognition
11.22 Challenges
11.23 Global Representation
11.24 View-Based Approaches
11.25 Feature representation
11.26 Recognition using Histograms
11.27 Histogram Comparison
11.28 Recognition using Histograms
11.29 Color Histograms
11.30 Discussion: Color Histograms
11.31 Generalization of the Idea
11.32 Multidimensional Histograms
11.33 Recognition Results
11.34 Summary
11.35 “Bag of words” Model
11.36 Analogy to Documents
11.37 Visual word distributions
11.38 Bag of Words
11.39 Bag-of-Words Model: Overview
12. bow
12.1 BoW-1. Feature detection and representation
12.2 BoW-2. Codeword dictionary formation
12.3 Vector quantization
12.4 K-Means
12.5 BoW-2. Codeword dictionary formation
12.6 Image patch examples of codewords
12.5 BoW-3. Image representation
12.6 Next: Actual Recognition
12.7 Excursion into Machine Learning
12.8 Classification vs. Regression
12.9 General Paradigm
12.10 Bayesian Decision Theory
Bayes optimal classifier:
12.11 Discriminative vs. generative
12.12 Discriminant Functions
12.13 Which Hyperplane is Best? and Why?
12.14 Support Vector Machines
12.15 Toward Neural Networks
12.16 Sigmoid
12.17 Multi-Class Network
12.18 Learning
12.19 Gradient Descent
k = l 时消去求和符号,当x = (x_j)时的求导,所以是(x_j)
12.20 Stochastic Gradient Descent
12.21 Gradient Descent
推导过程参见44页。
13 CNN
13.1 Bag-of-Words Model: Overview
13.2 Next: Actual Recognition
13.3 Machine Learning: General Paradigm
13.4 Discriminative vs. generative
13.5 Discriminant Functions
13.6 Toward Neural Networks
13.7 Sigmoid
13.8 Multi-Class Network
13.9 Multi-Class Network for Classification
13.10 Learning
13.11 Gradient Descent
13.12 Stochastic Gradient Descent
13.13 Multi-Layer Perceptron
(z_0 = 1)为偏差
13.14 Learning with Gradient Descent
13.15 Backpropagation Algorithm
13.16 Robot Navigation (a.k.a. “deep driving”)
13.17 Bag-of-Words Model: Done!
13.18 Summary & Discussion
13.19 What about spatial information?
13.20 Problem with bag-of-words
13.21 Model: Parts and Structure
13.22 Part-based Representation
13.23 The correspondence problem
13.24 Pyramid match kernel
13.25 Convolutional Neural Networks
13.26 CNN Architecture
13.27 History: Multistage Hubel-Wiesel Architecture
13.28 Overview of ConvNets
13.29 Krizhevsky et al. [NIPS 2012]
13.30 Using Features on Other Datasets
13.31 Caltech 256
13.32 Components of Each Layer
13.33 Filtering
13.34 Non-Linearity
13.35 Pooling
13.36 Role Of Pooling
13.37 Receptive Field
13.38 Components of Each Layer
13.39 Batch Normalization
13.40 Loss function
13.41 Initialization
13.42 How to Choose Architecture
13.43 AlexNet Architecture