zoukankan html css js c++ java

损失函数及其梯度

Typical Loss
MSE
- Derivative
- MSE Gradient
Softmax
- Derivative

Typical Loss

Mean Squared Error
Cross Entropy Loss
- binary
- multi-class
- +softmax

MSE

$l o s s = \sum [y - (x w + b)]^{2}$
$L_{2 - n o r m} = | | y - (x w + b) | |_{2}$
$l o s s = n o r m (y - (x w + b))^{2}$

Derivative

$l o s s = \sum [y - f_{θ} (x)]^{2}$
$\frac{\nabla loss}{\nabla θ} = 2 \sum [y - f_{θ} (x)] * \frac{\nabla f_{θ} (x)}{\nabla θ}$

MSE Gradient

import tensorflow as tf

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])
with tf.GradientTape() as tape:

tape.watch([w, b])

prob = tf.nn.softmax(x @ w + b, axis=1)

loss = tf.reduce_mean(tf.losses.MSE(tf.one_hot(y, depth=3), prob))
grads = tape.gradient(loss, [w, b])

grads[0]

<tf.Tensor: id=92, shape=(4, 3), dtype=float32, numpy=
array([[ 0.01156707, -0.00927749, -0.00228957],
       [ 0.03556816, -0.03894382,  0.00337564],
       [-0.02537526,  0.01924876,  0.00612648],
       [-0.0074787 ,  0.00161515,  0.00586352]], dtype=float32)>

grads[1]

<tf.Tensor: id=90, shape=(3,), dtype=float32, numpy=array([-0.01552947,  0.01993286, -0.00440337], dtype=float32)>

Softmax

soft version of max
大的越来越大，小的越来越小、越密集

21-损失函数及其梯度-softmax.jpg

Derivative

p_{i} = \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}

\frac{\partial p_{i}}{\partial a_{j}} = \frac{\partial \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}}{\partial a_{j}} = p_{i} (1 - p_{j})

$i \neq j$

\frac{\partial p_{i}}{\partial a_{j}} = \frac{\partial \frac{e^{a_{i}}}{\sum_{k = 1}^{N} e^{a_{k}}}}{\partial a_{j}} = - p_{j} * p_{i}

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])
with tf.GradientTape() as tape:

tape.watch([w, b])

logits =x @ w + b

loss = tf.reduce_mean(

tf.losses.categorical_crossentropy(tf.one_hot(y, depth=3),

logits,

from_logits=True))
grads = tape.gradient(loss, [w, b])

grads[0]

<tf.Tensor: id=226, shape=(4, 3), dtype=float32, numpy=
array([[-0.38076094,  0.33844548,  0.04231545],
       [-1.0262716 , -0.6730384 ,  1.69931   ],
       [ 0.20613424, -0.50421923,  0.298085  ],
       [ 0.5800004 , -0.22329211, -0.35670823]], dtype=float32)>

grads[1]

<tf.Tensor: id=224, shape=(3,), dtype=float32, numpy=array([-0.3719653 ,  0.53269935, -0.16073406], dtype=float32)>

查看全文

相关阅读:
java数据类型
 如何判断数组
 git 常用命令
 如何配置 ESLint 工作流
 Lambda表达式和函数式接口
 面向对象(多态与内部类)
面向对象（封装与继承）
面相对像（基础）
break;怎么跳出外部循环
 面向对象（类与对象）

原文地址：https://www.cnblogs.com/abdm-989/p/14123298.html