原问题
目前需要将经过mx.sym.Group的模型进行自定义后向传播,打包的symbol中包含经过MakeLoss的和直接输出(e.g., conv的输出)的数据,
记成如下形式:
modG = Module( mx.sym.Group( [data_A, data_B, loss_A, loss_B ] )
现在需要对其进行backward,显然,需要提供关于data_A和data_B的grad信息,但如何提供(形式是怎样的)?
查看实施的程序:
#python/mxnet/module/module.py
# --->
# python/build/lib.linux-i686-2.7/mxnet/module/executor_group.py
def backward(self, out_grads=None):
assert self.for_training, 're-bind with for_training=True to run backward'
if out_grads is None:
out_grads = []
for i, (exec_, islice) in enumerate(zip(self.execs, self.slices)):
out_grads_slice = []
for grad, axis in zip(out_grads, self.output_layouts):
if axis >= 0:
# pylint: disable=no-member
og_my_slice = nd.slice_axis(grad, axis=axis, begin=islice.start,
end=islice.stop)
# pylint: enable=no-member
out_grads_slice.append(og_my_slice.as_in_context(self.contexts[i]))
else:
out_grads_slice.append(grad.copyto(self.contexts[i]))
exec_.backward(out_grads=out_grads_slice)
需要查看如下变量:
modG._exec_group.slices
#[slice(0, 2, None)]
modG._exec_group.output_layouts
#[0, 0, 0, 0]
modG._exec_group.execs
#[<mxnet.executor.Executor object at 0xb130d8c>]
有些不知所踪,但看起来是要把需要的grad都放进去,但关于loss_A和loss_B的需不不需要虚位以待呢?
试了下,如下的命令可以通过:
outG = modG.get_outputs
grad_1=mx.nd.zeros(outG[1].shape)
grad_for_G=diffD+[grad_1] # diffD ~ mx.nd.zeros(outG[0].shape)
grad_for_G
#[<NDArray 2x3x64x64 @cpu(0)>, <NDArray 2x1x64x64 @cpu(0)>]
modG.backward(grad_for_G)
于是进一步查看了exec_.backward(out_grads=out_grads_slice)的操作:
//src/executor/graph_executor.cc
void GraphExecutor::Backward(const std::vector<NDArray>& head_grads) {
const auto& idx = graph_.indexed_graph();
if (num_forward_inputs_ != idx.input_nodes().size()) {
for (size_t i = 0; i < head_grad_array_.size(); ++i) {
if (!head_grad_array_[i].is_none()) {
CHECK(i < head_grads.size() && !head_grads[i].is_none())
<< "Because the last operator is not Loss function, "
<< "head_gradient is required in calling backward.";
CopyFromTo(head_grads[i], &(head_grad_array_[i]));
}
}
}
RunOps(true, num_forward_nodes_, idx.num_nodes());
}
看来,是根据是否分配了grad空间决定的,前面正好提到了关于此话题的内容。
也就是说MakeLoss里面应该有使其与grad_req产生关联的部分。查看了src/operator/make_loss-inl.h,但没有发现特别的地方。忽然想起Op注册时要进行依赖声明:
// src/operator/make_loss-inl.h
std::vector<int> DeclareBackwardDependency(
const std::vector<int> &out_grad,
const std::vector<int> &in_data,
const std::vector<int> &out_data) const override {
if (param_.normalization == make_loss_enum::kValid) {
return {in_data[make_loss_enum::kData]};
}
return {};
}
看起来只能是这里了。
Followup
另一个紧跟的问题是,如果modG的symbol变换了排列顺序呢:
modG = Module( mx.sym.Group( [data_A, loss_A, data_B, loss_B ] )
从src/executor/graph_executor.cc的程序来看,应该使两者对齐,但如何通过python/mxnet/module/module.py?
进一步,slices,output_layouts的变化应该被理解。
outG=modG.get_outputs()
outG
# [<NDArray 2x3x64x64 @cpu(0)>, <NDArray 1 @cpu(0)>, <NDArray 2x1x64x64 @cpu(0)>, <NDArray 1 @cpu(0)>]
modG._exec_group.output_layouts
# [0, 0, 0, 0]
modG._exec_group.slices
# [slice(0, 2, None)]
没变化。。。(⊙﹏⊙)b 这就尴尬了,怎么破?
grad_2=mx.nd.zeros(outG[2].shape)
grad_for_G=diffD+[grad_2]
modG.backward(grad_for_G)
[17:24:14] /home/chen-k/mxnet/dmlc-core/include/dmlc/./logging.h:300: [17:24:14] src/executor/graph_executor.cc:44: Check failed: i < head_grads.size() && !head_grads[i].is_none() Because the last operator is not Loss function, head_gradient is required in calling backward.
...
显然,机器发现,2号grad没对应的输入。那用None填充:
grad_for_G=diffD+[None,grad_2]
modG.backward(grad_for_G)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/module/module.py", line 465, in backward
self._exec_group.backward(out_grads=out_grads)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/module/executor_group.py", line 405, in backward
end=islice.stop)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/_ctypes/ndarray.py", line 131, in generic_ndarray_function
c_array(ctypes.c_char_p, [c_str(str(i)) for i in kwargs.values()])))
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/base.py", line 75, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: Invalid Parameter format for axis expect int but value='None', in operator slice_axis(name="", axis="None", end="2", begin="0")
也失败了。。。再试试随便填充:
grad_for_G=diffD+[grad_2,grad_2]
modG.backward(grad_for_G)
可以通过。。。
Unsolved
令人不畅的是python/mxnet/module/module.py中已经有可以通过的机制(猜测可以利用axis<0),但为什么两次output_layouts和slices都没有变化?
思之令人心塞。记到这吧,先解决问题再说。