Derivative of Softmax Loss Function
A softmax classifier:
[p_j = frac{exp{o_j}}{sum_{k}exp{o_k}}
]
It has been used in a loss function of the form
[L = - sum_{j} y_j log p_j
]
where (o) is a vector. We need the derivative of (L) with respect to (o). We can get the partial of (o_i) :
[frac{partial{p_j}}{partial{o_i}} = p_i (1-p_i), quad i = j \
frac{partial{p_j}}{partial{o_i}} = - p_i p_j, quad i
e j
]
Hence the derivative of Loss with respect to (o) is:
[egin{align}
frac{partial{L}}{partial{o_i}} & = - sum_k y_k frac{partial{log p_k}}{partial{o_i}} \
& = - sum_k y_k frac{1}{p_k} frac{partial{p_k}}{partial{o_i}} \
& = -y_i(1-p_i) - sum_{k
e i} y_k frac{1}{p_k} (-p_kp_i) \
& = -y_i + y_i p_i + sum_{k
e i} y_k p_i \
& = p_i (sum_k y_k) - y_i \
end{align}
]
Given that (sum_k y_k = 1) as (y) is a vector with only one non-zero element, which is 1. By other words, this is a classification problem.
[frac{partial L}{partial o_i} = p_i - y_i
]