(a) theory
证明对于任意输入向量x和常数c,softmax的输出不会随着c的改变而改变,即softmax(x) = softmax(x+c)
note:在实际使用中,经常利用这个性质,将每个元素x减去最大的那个元素,即最大值为0.
证明:利用softmax的公式分别将两边展开计算即可。
(b) coding
def softmax(x):
"""Compute the softmax function for each row of the input x.
It is crucial that this function is optimized for speed because
it will be used frequently in later code. You might find numpy
functions np.exp, np.sum, np.reshape, np.max, and numpy
broadcasting useful for this task.
Numpy broadcasting documentation:
http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
You should also make sure that your code works for a single
D-dimensional vector (treat the vector as a single row) and
for N x D matrices. This may be useful for testing later. Also,
make sure that the dimensions of the output match the input.
You must implement the optimization in problem 1(a) of the
written assignment!
Arguments:
x -- A D dimensional vector or N x D dimensional numpy matrix.
Return:
x -- You are allowed to modify x in-place
"""
orig_shape = x.shape
if len(x.shape) > 1:
# Matrix
x -= np.max(x,axis = 1,keepdims=True)
x = np.exp(x)/np.sum(np.exp(x),axis = 1,keepdims=True)
else:
# Vector
x -= np.max(x)
x = np.exp(x)/np.sum(np.exp(x))
assert x.shape == orig_shape,"something wrong"
return x