本文参考于(https://zhuanlan.zhihu.com/p/105758059)
大家可以参考上述链接,更加详细。
- softmax之前的输入为
(z = [z_1,z_2,...,z_n]) - 经过softmax之后,
(a_i = frac{e^{z_i}}{sum_{k=1}^{n}e^{z_k}})
可得a向量(a = [frac{e^{z_1}}{sum_{k=1}^{n}e^{z_k}},frac{e^{z_2}}{sum_{k=1}^{n}e^{z_k}},...,frac{e^{z_n}}{sum_{k=1}^{n}e^{z_k}}]) - 目标向量为
y = [0,0,0,...,1,..0],假设(y_j=1)其余均为0 - 损失函数为交叉熵损失
(L = -sum_{i=1}^{n}y_i*lna_i),又其他均为0,故可以简写成(L = -y_j*lna_j = -lna_j)
目标是标量L对向量z求导,(frac{partial L}{partial Z} = frac{partial L}{partial a}*frac{partial a}{partial z})
1 求(frac{partial L}{partial a})
由(L = -lna_j)得,loss只与a_j有关
(frac{partial L }{partial a} = [0,0,...,-frac{1}{a_j},..0])
2 求(frac{partial a}{partial z})
a是一个向量,z是一个向量,(frac{partial a}{partial z} =
left[
egin{matrix}
frac{partial a_1}{partial z_1} & frac{partial a_1}{partial z_2} & cdots & frac{partial a_1}{partial z_n}\
frac{partial a_2}{partial z_1} & frac{partial a_2}{partial z_2} & cdots & frac{partial a_2}{partial z_n}\
vdots & vdots & vdots & vdots \
frac{partial a_n}{partial z_1} & frac{partial a_n}{partial z_2} & cdots & frac{partial a_n}{partial z_n}\
end{matrix}
ight]
)
由于(frac{partial l}{partial a})只有第j列不为0,我们只需要求(frac{partial a}{partial z})的第行,即(frac{partial a_j}{partial z})
(frac{partial L}{partial Z} = -frac{1}{a_j}*frac{partial a_j}{partial Z}),其中(a_j = frac{e^{z_j}}{sum_{i=1}^{n}e^{z_k}})
- 当(i
ot= j)
(frac{partial a_j}{partial z_i} = frac{0-e^{z_j}*e^{z_i}}{(sum_{i=1}^{n}e^{z_k})^2} = -a_j*a_i)
(frac{partial L}{partial z_i} = -frac{1}{a_j}*frac{partial a_j}{partial z} = -frac{1}{a_j}*(-a_j*a_i) = a_i) - 当(i = j)
(frac{partial a_j}{partial z_j} = frac{e^{z_j}*sum_{i=1}^{n}e^{z_k}-e^{z_j}*e^{z_j}}{(sum_{i=1}^{n}e^{z_k})^2} = a_j- a_j^2)
(frac{partial L}{partial z_j} = (a_j-a_j^2)*(-frac{1}{a_j}) = a_j-1)
所以(frac{partial L}{partial Z} = [a_1,a_2,...a_j-1,..a_n] = [a_1,a_2,,,,a_j,...,a_n] - [0,0,...,1,..0] = a - y)