zoukankan html css js c++ java

损失函数及其梯度

Typical Loss
MSE
- Derivative
- MSE Gradient
Softmax
- Derivative

Typical Loss

Mean Squared Error
Cross Entropy Loss
- binary
- multi-class
- +softmax

MSE

(loss = sum[y-(xw+b)]^2)
(L_{2-norm} = ||y-(xw+b)||_2)
(loss = norm(y-(xw+b))^2)

Derivative

(loss = sum[y-f_ heta(x)]^2)
(frac{ abla ext{loss}}{ abla{ heta}}=2sum{[y-f_ heta(x)]}*frac{ abla{f_ heta{(x)}}}{ abla{ heta}})

MSE Gradient

import tensorflow as tf

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])

with tf.GradientTape() as tape:
    tape.watch([w, b])
    prob = tf.nn.softmax(x @ w + b, axis=1)
    loss = tf.reduce_mean(tf.losses.MSE(tf.one_hot(y, depth=3), prob))

grads = tape.gradient(loss, [w, b])
grads[0]

<tf.Tensor: id=92, shape=(4, 3), dtype=float32, numpy=
array([[ 0.01156707, -0.00927749, -0.00228957],
       [ 0.03556816, -0.03894382,  0.00337564],
       [-0.02537526,  0.01924876,  0.00612648],
       [-0.0074787 ,  0.00161515,  0.00586352]], dtype=float32)>

grads[1]

<tf.Tensor: id=90, shape=(3,), dtype=float32, numpy=array([-0.01552947,  0.01993286, -0.00440337], dtype=float32)>

Softmax

soft version of max
大的越来越大，小的越来越小、越密集

21-损失函数及其梯度-softmax.jpg

Derivative

[p_i = frac{e^{a_i}}{sum_{k=1}^Ne^{a_k}} ]

[frac{partial{p_i}}{partial{a_j}}=frac{partial{frac{e^{a_i}}{sum_{k=1}^Ne^{a_k}}}}{{partial{a_j}}} = p_i(1-p_j) ]

(i eq{j})

[frac{partial{p_i}}{partial{a_j}}=frac{partial{frac{e^{a_i}}{sum_{k=1}^Ne^{a_k}}}}{{partial{a_j}}} = -p_j*p_i ]

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])

with tf.GradientTape() as tape:
    tape.watch([w, b])
    logits =x @ w + b
    loss = tf.reduce_mean(
        tf.losses.categorical_crossentropy(tf.one_hot(y, depth=3),
                                           logits,
                                           from_logits=True))

grads = tape.gradient(loss, [w, b])
grads[0]

<tf.Tensor: id=226, shape=(4, 3), dtype=float32, numpy=
array([[-0.38076094,  0.33844548,  0.04231545],
       [-1.0262716 , -0.6730384 ,  1.69931   ],
       [ 0.20613424, -0.50421923,  0.298085  ],
       [ 0.5800004 , -0.22329211, -0.35670823]], dtype=float32)>

grads[1]

<tf.Tensor: id=224, shape=(3,), dtype=float32, numpy=array([-0.3719653 ,  0.53269935, -0.16073406], dtype=float32)>

查看全文

相关阅读:
封装成帧、帧定界、帧同步、透明传输（字符计数法、字符串的首尾填充法、零比特填充的首尾标志法、违规编码法）
计算机网络之数据链路层的基本概念和功能概述
 物理层设备（中继器、集线器）
计算机网络之传输介质（双绞线、同轴电缆、光纤、无线电缆、微波、激光、红外线）
计算机网络之编码与调制
 0953. Verifying an Alien Dictionary (E)
1704. Determine if String Halves Are Alike (E)
1551. Minimum Operations to Make Array Equal (M)
0775. Global and Local Inversions (M)
0622. Design Circular Queue (M)

原文地址：https://www.cnblogs.com/nickchen121/p/10906835.html