CVX notes
Preliminaries
1. PSD
M is positive semidefinite matrix (iff) all principal submatrices (P) of (M) are PSD
Note: This follows by considering the quadratic form (x^T Mx) and looking at the components of (x) corresponding to the defining subset of principal submatrix. The converse is trivially true.
M is PSD (iff) all principal minors are non-negative (所有主子式非负)
将M写成二次型:
于是取 (x) 为标准基 (e_i ~implies M_{ii} ge 0 implies mathbf{tr}(M) ge 0) , 再取(x)为零向量只有 i,j两个位置为 1,则
2. Matrix norm
General definition of a norm:
Matrix norm:
- Frobenius norm: (|A|_F := sqrt{langle A,A angle_F} = sqrt{mathbf{tr}(A^*A)})
- Induced norm: (|A|_p := sup_limits{|x|_p = 1} |Ax|_p)
- Nuclear norm: (|A|_{nuclear} := sum sigma_i(A)) (奇异值之和)
- Spectral norm: (|A|_{spectral} := lambda_1) (最大特征值)
Spectrial radius


3. Duality
Two equivalent ways to represent a convex set:
- standard representation: The family of points in the set
- dual representation: The set of halfspaces containing the set (半平面的交集)
A closed convex set (S) is the intersection of all closed halfspaces (H) containing it.


Polar
Let (S subseteq mathbb{R}^n) be a convex set containing the origin. The polar of (S) is defined as follows
Note
- polar is one way of representing the all halfspaces containing a convex set
- every halfspace (a^Tx le b) with (b eq 0) can be written as a “normalized” inequality (y^T x le 1), by dividing by (b)
- (S^{circ}) can be thought of as the normalized representations of halfspaces containing (S)
Properties of the polar:
- (S^{circcirc} = S)
- (S^{circ}) is a closed convex set containing the origin
- When 0 is in the interior of (S), then (S^{circ}) is bounded
- When (S) is non-convex, (S^{circ} = (mathbf{conv}(S))^{circ}), and (S^{circcirc} = mathbf{conv}(S))


Polar duality of convex cones

Notes
- (K^{circcirc} = K)
- (K^{circ}) is closed and convex
Conjugation of convex functions
Let (f: mathbb{R}^n mapsto mathbb{R}cup{infty}) be a convex function. The conjugation of (f) is

Properties of the conjugate
- (f^{**} = f)
- (f^*) is convex (supremum of affine functions of (y))

Convex sets
Convex functions

-
affine is convex: (f(x) = a^T x+b)
affine 既凸也凹
-
任何_范数_是凸的
Proof: let (pi(x)) be a norm of (x), then
-
(f) is convex (iff) epi((f)) is convex
1. Closed convex
A convex function (f) is called closed if its epigraph is a closed set.
- (f) which is convex and continuous on a closed domain is a closed function. (norms)
- all differentiable convex functions are closed with dom(f = mathbb{R}^n).
- 当考虑一个凸函数时,通常认为在dom(f)外取值为(infty)
- Jensen's inequality:
- Corollary:
pf: (f(x) = f(sumalpha_i x_i) le sum alpha_i f(x_i) le max_limits{i} f(x_i))
2. Level sets

Note: the convexity of level sets does not characterize convex functions, but quasiconvex functions.
- convex (f) is closed (implies) all its level sets are closed
Some convex sets
- norm ball (({xin mathbb{R}^n | |x| le 1})) is convex and closed
- 椭球(({x | (x-a)^T Q (x-a) le r^2})) is convex and closed
pf: (x^TQy := langle x, y angle) 满足内积的三条性质
- bilinearity
- symmetry
- positivity
上述三条性质 (iff) Q is PSD
- (epsilon)-neighborhood: 
3. Operations perserving convexity of functions
- stability under taking weighted sums: (f,g mapsto lambda f + mu g, ; lambda,mu ge 0)
- stability under affine substitutions of the argument: (x mapsto Ax+b) or (f(x) mapsto phi(x) = f(Ax+b))
- stability under taking pointwise sup: ({f_i}_{i in mathcal{I}} mapsto g(x) := sup_limits{i in mathcal{I}}f_i(x)), 凸函数族 ({f_i}_{i in mathcal{I}}) 逐点取上确界而成的函数也是凸的
- stability under partial minimization: (f(x,y)) jointly convex in ((x,y)), then (g(x) = inf_limits{y} f(x,y)) is convex (suppose g is proper, i.e., > -(infty) everywhere and is finite at least at one point)
- stability under perspective: (f(x) mapsto g(x,t) = tf(x/t), mathbf{dom}g = {(x,t) | x/t in mathbf{dom}f, t > 0})
4. Detect convexity
Necessary and Sufficient Convexity Condition for smooth function:
- 一阶可微的光滑函数 (f) 是凸的 (iff) (f'(x)) 单调非减
- 二阶可微的光滑函数 (f) 是凸的 (iff) (f''(x) ge 0)
subgradient property is characteristic of convex functions:
5. Subgradient
Examples
- 
- 
- 
- 
6. Optimality conditions
凸函数的局部最优等价于全局最优。
第一充要条件(凸函数)
(x^* in mathbf{dom}f) is the minimizer (iff) (0 in partial f(x^*))
7. Strong convexity
A differentiable function f is strongly convex if
Note
- (f) is not necessarily differentiable, (see the equivalent definition)
- if (f) is non-smooth, gradient -> subgradient
- strong convexity (implies) strict convexity
Note: Intuitively speaking, strong convexity means that there exists a quartic lower bound on the growth of the function.
Equivalent definition
Lagrange Duality
Consider an optimization problem in standard form (not necessarily convex)
The Lagrangian is
The Lagrange dual function is defined as

Lagrange dual problem
Weak duality
- (d^*): optima of dual problem
- (p^*): optima of primal problem
- duality gap: (p^* - d^*)
- always hold
Strong dualiy
- constraint qualifications (implies) strong duality
- Slater’s Constraint Qualification: a convex problem is strictly feasible (i.e., (exists ~x in mathbf{int} mathcal{D}: x in Omega))
Complementary slackness

KKT conditions


Cones
Tagent cone
Let M be a (nonempty) convex set and (x^* in M), the tagent cone of (M) at (x^*) is the cone
Note:
- Geometrically, this is the set of all directions leading from (x^*) inside (M)
- convex but not necessarily closed
- fact: if (x^*) is a minimizer, then (forall h in T_M(x^*) implies h^T abla f(x^*) ge 0). (因为tangent cone里面都是可行解,所以必须不是下降方向)
- (T_M(x^*) = mathbb{R}^n iff x^* in mathbf{int}M)
e.g. 多面体
the tangent cone at (x^*) is
Normal cone: the polar cone of tangent cone
Note:
- normal cone is the polar to tangent cone, i.e.,
- fact: if (x^*) is a minimizer, then (- abla f(x^*) in N_M(x^*)).
- 
Algorithm convergence
~ | Stepsize Rule | Convergence Rate | Iteration Complexity |
---|---|---|---|
Gradient descent | |||
strongly convex & smooth | (eta_t = frac{2}{mu + L}) | (Oleft(frac{kappa -1}{kappa +1} ight)^t) | (Oleft(frac{logfrac{1}{epsilon}}{logfrac{kappa+1}{kappa-1}} ight)) |
convex & smooth | (eta_t = frac{1}{L}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon})) |
Frank-Wolfe | |||
(strongly) convex & smooth | (eta_t = frac{1}{t}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon})) |
Projected GD | |||
convex & smooth | (eta_t = frac{1}{L}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon})) |
strongly convex & smooth | (eta_t = frac{1}{L}) | (Oleft((1-frac{1}{kappa})^t ight)) | (O(kappalogfrac{1}{epsilon})) |
Subgradient method | |||
convex & Lipschitz | (eta_t = frac{1}{sqrt{t}}) | (O(frac{1}{sqrt{t}})) | (O(frac{1}{epsilon^2})) |
strongly convex & Lipschitz | (eta_t = frac{1}{t}) | (Oleft(frac{1}{t} ight)) | (O(frac{1}{epsilon})) |
Proximal GD | |||
convex & smooth (w.r.t. (f)) | (eta_t = frac{1}{L}) | (O(frac{1}{t})) | (O(frac{1}{epsilon})) |
strongly convex & smooth (w.r.t. (f)) | (eta_t = frac{1}{L}) | (Oleft((1-frac{1}{kappa})^t ight)) | (O(kappalogfrac{1}{epsilon})) |