DAGs with NO TEARS: Continuous Optimization for Structure Learning

zoukankan html css js c++ java

DAGs with NO TEARS: Continuous Optimization for Structure Learning
DAGs with NO TEARS: Continuous Optimization for Structure Learning
目录
DAGs with NO TEARS: Continuous Optimization for Structure Learning
概
主要内容
等价条件的推导
$mathrm{tr}(I-W)^{-1} = d$
$mathrm{tr}(e^W)=d$
$mathrm{tr}(W^k) = 0$
$mathrm{tr}(e^{W circ W}) =d$
$mathrm{tr}(I + W circ W)^d =d$
性质的推导
求解
代码
Zheng X., Aragam B., Ravikumar P. and Xing E. DAGs with NO TEARS: Continuous Optimization for Structure Learning. In Advances in Neural Information Processing Systems (NIPS), 2018.

概

有向图可以用邻接矩阵(A in {0, 1}^{d imes d})来表示, 其中(A_{ij} = 1) 表示 node (i) 指向 node (j). 进一步的, 我们想要表示有向无环图(DAG), 则(A)需要满足额外的性质, 保证无环.

现在的问题是, 有一堆观测数据(X in mathbb{R}^{n imes d}), 如何通过这些数据推测其(特征之间的)关系, 即对应的(A).

主要内容

首先, 假设特征之间满足一个线性关系:

[X_j = w_j^T X + z_j, ]
其中

[W = [w_1|w_2|cdots|w_d] in mathbb{R}^{d}, ]
(z)为随机的噪声.

通过(W)可以推出相应的(A=mathcal{A}(W)), 即

[W_{ij} ot = 0 Leftrightarrow A_{ij} = 1, W_{ij} =0 Leftrightarrow A_{ij} = 0. ]
故我们目标通常是:

[min_{W} quad ell(W;X) = frac{1}{2n}|X - XW|_F^2, \ mathrm{s.t.} quad mathcal{A}(W) in mathbb{D}, ]
其中(mathbb{D})表示有向无环图.

进一步地, 因为我们希望(W)是一个系数的矩阵(否则断然不是DAG), 故

[F(W;X) = ell(W;X) + lambda |W|_1, ]
并

[min_W quad F(W;X) \ mathrm{s.t.} quad mathcal{A}(W) in mathbb{D}. ]
显然现在的关键是如何处理(mathcal{A}(W) in mathbb{D})这个条件, 以前的方法通常需要复杂的运算, 本文提出一种等价的条件

[h(W) = 0, ]
满足
1. (h(W)= 0)当且仅当(mathcal{A}(W) in mathbb{D});
2. (h(W))越小, 说明(mathcal{A}(W))越接近无环图;
3. (h(W))是一个光滑函数;
4. (h(W))便于求导.
显然1是期望的, 2可以用于判断所得的(W)的优劣, 3, 4便于我们用数值方法求解.

等价条件的推导

(mathrm{tr}(I-W)^{-1} = d)

Proposition 1: 假设(W in mathbb{R}_+^{d imes d}) 且 (|W| < 1), 则(mathcal{A}(W))能够表示有向无环图当且仅当

[mathrm{tr}(I - W)^{-1} = d. ]
proof:

(A = mathcal{A}(W))能够表示有向无环图, 当且仅当

[mathrm{tr}(A^k) = 0 Leftrightarrow mathrm{tr} (W^k) = 0, forall: k=1,cdots ]
(Rightarrow)

由于(|W| < 1)(最大奇异值小于1), 故

[mathrm{tr}(I-W)^{-1} = mathrm{tr}(sum_{k=0} W^k) = mathrm{tr}(I) = d. ]
(Leftarrow)

(mathrm{tr}(W^k) ge 0), 故

[mathrm{tr}(I-W)^{-1} = d ]
当且仅当

[mathrm{tr}(W^k) = 0. ]
注: (|W| < 1)这个条件并不容易满足.

(mathrm{tr}(e^W)=d)

注: (e^A = I + sum_{k=1} frac{A^k}{k!}).

Proposition 2: 假设(W in mathbb{R}_+^{d imes d}), 则(mathcal{A}(W))能够表示有向无环图当且仅当

[mathrm{tr}(e^W) = d. ]
proof:

证明是类似的.

注: 此时对(W)的最大奇异值没有要求.

(mathrm{tr}(W^k) = 0)

这部分的证明可能应该归属于DAG-GNN.

Proposition 3: 假设(W in mathbb{R}_+^{d imes d}) , 则(mathcal{A}(W))能够表示有向无环图当且仅当

[mathrm{tr}(W^k) = 0, : k=1,2,cdots, d. ]
proof:

(Rightarrow)是显然的, 证明(Rightarrow)只需说明

[mathrm{tr}(W^k)=0, : k=1,2,cdots, d Rightarrow mathrm{tr}(W^k), : kge 1. ]
假设(W)的特征多项式为(p(lambda) = sum_{k=0}^d eta_k lambda^k, eta_d=1), 则有

[p(W) = sum_{k=0}^d eta_k W^k = 0. ]
进一步有

[W^{d} = -sum_{k=0}^{d-1} eta_k W^k Rightarrow W^{d+1} = -sum_{k=1}^d eta_k W^{k+1} Rightarrow mathrm{tr}(W^{d+1}) = -sum_{k=1}^d eta_k mathrm{tr}(W^{k+1}) = 0. ]
由归纳假设可知结论成立.

Corollary 1: 假设(W in mathbb{R}_+^{d imes d}) , 则(mathcal{A}(W))能够表示有向无环图当且仅当

[mathrm{tr}(I+W)^d=d. ]
(mathrm{tr}(e^{W circ W}) =d)

注: (circ) 表示哈达玛积, 即对应元素相乘.

上面依然要求(W)各元素大于0, 一个好的办法是:

Theorem 1: 一个矩阵(W in mathbb{R}^{d imes d}), 则(mathcal{A}(W)) 能表示有向无环图当且仅当

[mathrm{tr}(e^{W circ W}) =d. ]
proof:

(mathcal{A}(W)=mathcal{A}(W circ W)).

(mathrm{tr}(I + W circ W)^d =d)

Theorem 2: 一个矩阵(W in mathbb{R}^{d imes d}), 则(mathcal{A}(W)) 能表示有向无环图当且仅当

[mathrm{tr}(I + W circ W)^d =d. ]
注: (W circ W)前面加个系数也是没关系的.

性质的推导

故, 此时我们只需设置

[h(W) = mathrm{tr}(e^{Wcirc W}) - d ]
显然满足1,2,3, 接下来我们推导其梯度

[egin{array}{ll} mathrm{d}h(W) &= mathrm{d}: mathrm{tr} (e^{Wcirc W}) \ &= mathrm{tr} (mathrm{d}e^{Wcirc W}) \ &= mathrm{tr} (mathrm{d}sum_{k=1} frac{M^k}{k!}) \ &=sum_{k=1} mathrm{tr} ( frac{mathrm{d}M^k}{k!}) \ &=sum_{k=0} mathrm{tr} ( frac{M^k mathrm{d}M}{k!}) \ &= mathrm{tr}(e^{Wcirc W} cdot mathrm{d}(Wcirc W)) \ &= mathrm{tr}(e^{Wcirc W} cdot (2W circ mathrm{d} W)) \ &= mathrm{tr}(e^{Wcirc W} circ 2W^T cdot mathrm{d} W) \ end{array} ]
故

[ abla h(W) = (e^{Wcirc W})^T circ W. ]
注: 其中(M =W circ W).

求解

利用augmented Lagrangian转换为(这一块不是很懂, 但只是数值求解的东西, 不影响理解)

[min_W max_{alpha}quad ell (W;X) +lambda |W|_1 + frac{ ho}{2}|h(W)|^2 + alpha h(W), ]
具体求解算法如下:

代码

原文代码
查看全文

相关阅读:
新数学丛书《连分数》习题 1.1
连分数中一个有意思的小玩意儿
 无聊博文之:用同余的语言阐述欧几里德算法
 有向无环图
 Codeforces Round #658 (Div. 2)
常用代码模板3——搜索与图论
 什么是动态规划？动态规划的意义是什么？（转自知乎）
Codeforce：4C. Registration system （映射）
C++与正则表达式入门
 常用代码模板4——数学知识

原文地址：https://www.cnblogs.com/MTandHJ/p/14819671.html

DAGs with NO TEARS: Continuous Optimization for Structure Learning

DAGs with NO TEARS: Continuous Optimization for Structure Learning

概

主要内容

等价条件的推导

(mathrm{tr}(I-W)^{-1} = d)

(mathrm{tr}(e^W)=d)

(mathrm{tr}(W^k) = 0)

(mathrm{tr}(e^{W circ W}) =d)

(mathrm{tr}(I + W circ W)^d =d)

性质的推导

求解

代码