zoukankan      html  css  js  c++  java
  • #mxnet#多符号输出模型后向传播

    原问题

    目前需要将经过mx.sym.Group的模型进行自定义后向传播,打包的symbol中包含经过MakeLoss的和直接输出(e.g., conv的输出)的数据,
    记成如下形式:

    modG = Module( mx.sym.Group( [data_A, data_B, loss_A, loss_B ] )
    

    现在需要对其进行backward,显然,需要提供关于data_Adata_Bgrad信息,但如何提供(形式是怎样的)?

    查看实施的程序:

    #python/mxnet/module/module.py
    #  --->
    # python/build/lib.linux-i686-2.7/mxnet/module/executor_group.py
        def backward(self, out_grads=None):
            assert self.for_training, 're-bind with for_training=True to run backward'
            if out_grads is None:
                out_grads = []
    
            for i, (exec_, islice) in enumerate(zip(self.execs, self.slices)):                                                                    
                out_grads_slice = []
                for grad, axis in zip(out_grads, self.output_layouts):
                    if axis >= 0:
                        # pylint: disable=no-member
                        og_my_slice = nd.slice_axis(grad, axis=axis, begin=islice.start,
                                                    end=islice.stop)
                        # pylint: enable=no-member
                        out_grads_slice.append(og_my_slice.as_in_context(self.contexts[i]))
                    else:
                        out_grads_slice.append(grad.copyto(self.contexts[i]))
    
                exec_.backward(out_grads=out_grads_slice)
    

    需要查看如下变量:

    modG._exec_group.slices
    #[slice(0, 2, None)]
    
     modG._exec_group.output_layouts
    #[0, 0, 0, 0]
    
    modG._exec_group.execs
    #[<mxnet.executor.Executor object at 0xb130d8c>]
    

    有些不知所踪,但看起来是要把需要的grad都放进去,但关于loss_Aloss_B的需不不需要虚位以待呢?
    试了下,如下的命令可以通过:

    outG = modG.get_outputs
    grad_1=mx.nd.zeros(outG[1].shape)
    grad_for_G=diffD+[grad_1]         #   diffD ~ mx.nd.zeros(outG[0].shape)
    grad_for_G
    #[<NDArray 2x3x64x64 @cpu(0)>, <NDArray 2x1x64x64 @cpu(0)>]
    modG.backward(grad_for_G)
    

    于是进一步查看了exec_.backward(out_grads=out_grads_slice)的操作:

    //src/executor/graph_executor.cc
    void GraphExecutor::Backward(const std::vector<NDArray>& head_grads) {
      const auto& idx = graph_.indexed_graph();
      if (num_forward_inputs_ != idx.input_nodes().size()) {
        for (size_t i = 0; i < head_grad_array_.size(); ++i) {
          if (!head_grad_array_[i].is_none()) {
            CHECK(i < head_grads.size() && !head_grads[i].is_none())
                << "Because the last operator is not Loss function, "
                << "head_gradient is required in calling backward.";
            CopyFromTo(head_grads[i], &(head_grad_array_[i]));
          }   
        }   
      }
      RunOps(true, num_forward_nodes_, idx.num_nodes());
    }
    

    看来,是根据是否分配了grad空间决定的,前面正好提到了关于此话题的内容
    也就是说MakeLoss里面应该有使其与grad_req产生关联的部分。查看了src/operator/make_loss-inl.h,但没有发现特别的地方。忽然想起Op注册时要进行依赖声明:

    // src/operator/make_loss-inl.h
      std::vector<int> DeclareBackwardDependency(
          const std::vector<int> &out_grad,
          const std::vector<int> &in_data,
          const std::vector<int> &out_data) const override {
        if (param_.normalization == make_loss_enum::kValid) {
          return {in_data[make_loss_enum::kData]};
        }   
        return {}; 
      }
    

    看起来只能是这里了。

    Followup

    另一个紧跟的问题是,如果modGsymbol变换了排列顺序呢:

    modG = Module( mx.sym.Group( [data_A, loss_A, data_B, loss_B ] )
    

    src/executor/graph_executor.cc的程序来看,应该使两者对齐,但如何通过python/mxnet/module/module.py?
    进一步,slicesoutput_layouts的变化应该被理解。

    outG=modG.get_outputs()
    outG
    # [<NDArray 2x3x64x64 @cpu(0)>, <NDArray 1 @cpu(0)>, <NDArray 2x1x64x64 @cpu(0)>, <NDArray 1 @cpu(0)>]
    modG._exec_group.output_layouts
    # [0, 0, 0, 0]
    modG._exec_group.slices
    # [slice(0, 2, None)]
    

    没变化。。。(⊙﹏⊙)b 这就尴尬了,怎么破?

    grad_2=mx.nd.zeros(outG[2].shape)
    grad_for_G=diffD+[grad_2] 
    modG.backward(grad_for_G)
    [17:24:14] /home/chen-k/mxnet/dmlc-core/include/dmlc/./logging.h:300: [17:24:14] src/executor/graph_executor.cc:44: Check failed: i < head_grads.size() && !head_grads[i].is_none() Because the last operator is not Loss function, head_gradient is required in calling backward.
    ...
    

    显然,机器发现,2号grad没对应的输入。那用None填充:

    grad_for_G=diffD+[None,grad_2] 
    modG.backward(grad_for_G)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/module/module.py", line 465, in backward
        self._exec_group.backward(out_grads=out_grads)
      File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/module/executor_group.py", line 405, in backward
        end=islice.stop)
      File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/_ctypes/ndarray.py", line 131, in generic_ndarray_function
        c_array(ctypes.c_char_p, [c_str(str(i)) for i in kwargs.values()])))
      File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.1-py2.7.egg/mxnet/base.py", line 75, in check_call
        raise MXNetError(py_str(_LIB.MXGetLastError()))
    mxnet.base.MXNetError: Invalid Parameter format for axis expect int but value='None', in operator slice_axis(name="", axis="None", end="2", begin="0")
    

    也失败了。。。再试试随便填充:

    grad_for_G=diffD+[grad_2,grad_2] 
    modG.backward(grad_for_G)
    

    可以通过。。。

    Unsolved

    令人不畅的是python/mxnet/module/module.py中已经有可以通过的机制(猜测可以利用axis<0),但为什么两次output_layoutsslices都没有变化?
    思之令人心塞。记到这吧,先解决问题再说。

  • 相关阅读:
    [AHOI2005]航线规划(树链剖分+时间倒流)
    洛谷4317花神的数论题(数位DP)
    天天爱跑步(NOIP2016)
    BZOJ4730 Alice和Bob又在玩游戏
    基础数论总结
    poj1845(数论)
    扩展(bsgs+卢卡斯)(bzoj3283)
    古代猪文(数论)
    BSGS
    构建之法阅读笔记3
  • 原文地址:https://www.cnblogs.com/chenyliang/p/6792292.html
Copyright © 2011-2022 走看看