zoukankan      html  css  js  c++  java
  • tensorflow中的padding方式SAME和VALID的区别

    Tensorflow中的padding有两种方式,其中的SAME方式比较特殊,可能产生非对称pad的情况,之前在验证一个tensorflow的网络时踩到这个坑。

    Tensorflow的计算公式

    二维卷积接口

    tf.nn.conv2d(
        input, filters, strides, padding, data_format='NHWC', dilations=None, name=None
    )
    

    padding计算公式

    需要注意padding的配置,如果是字符串就有SAMEVALID两种配置,如果是数字list就明确表明padding在各个维度的数量。

    首先,padding如果表示确切的数字,其维度是input维度的2倍,因为每个维度两个边需要补pad,比如宽度的左边和右边,高度的上边和下边,但是tensorflow中不会给N维度以及C维度补pad,仅仅在W和H维度补pad,因此对于NHWCpadding =[[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]] ,对于NCHW的pad的顺序换过来。

    然后,如果输入的是字符串选项,补的pad都可以映射到padding这个参数上,

    • VALID模式表示不在任何维度补pad,等价于padding =[[0, 0], [0, 0], [0, 0], [0, 0]]

    • SAME模式表示在stride的尺度下,WoWi保持在stride(S)下保持一致(以宽度维度为例),需要满足如下关系

      [W_{o}=leftlceilfrac{W_{i}}{S} ight ceil ]

      我们知道如果dilation = 1,那么在某个维度上,卷积的输出宽度(W_i)、输出宽度(W_o)和步长(S),在没有pad的情况下,满足如下的关系式

      [W o=leftlfloorfrac{W i-W_{k}}{S} ight floor+1 ]

      我们以补最小程度的(P_a)为基准,于是有关系式

      [W o=frac{W i+P_{a}-W_{k}}{S}+1 ]

      反推过来得到

      [P_{a}=left(W_{o}-1 ight) S+W_{k}-W_{i} ]

      这是需要补的总的pad,tensorflow的补充pad是尽量两边对称的,

      • 如果(P_a)是偶数,那么两边都补(P_l = P_a/2)
      • 如果(P_a)是奇数,那么左边补(P_l = lfloor{P_a/2} floor),右边是(P_r = P_a-P_l)

    参考如下的代码

    inputH, inputW = 7, 8
    strideH, strideW = 3, 3
    filterH = 4
    filterW = 4
    inputC = 16
    outputC = 3
    inputData = np.ones([1,inputH,inputW,inputC]) # format [N, H, W, C]
    filterData = np.float16(np.ones([filterH, filterW, inputC, outputC]) - 0.33)
    strides = (1, strideH, strideW, 1)
    convOutputSame = tf.nn.conv2d(inputData, filterData, strides, padding='SAME')
    convOutput = tf.nn.conv2d(inputData, filterData, strides, padding=[[0,0],[1,2],[1,1],[0,0]]) # padded input data
    print("output1, ", convOutputSame)
    print("output2, ", convOutput)
    print("Sum of a - b is ", np.sum(np.square(convOutputSame - convOutput)))
    

    计算结果是

        output1,  tf.Tensor(
        [[[[ 96.46875  96.46875  96.46875]
           [128.625   128.625   128.625  ]
           [ 96.46875  96.46875  96.46875]]
    
          [[128.625   128.625   128.625  ]
           [171.5     171.5     171.5    ]
           [128.625   128.625   128.625  ]]
    
          [[ 64.3125   64.3125   64.3125 ]
           [ 85.75     85.75     85.75   ]
           [ 64.3125   64.3125   64.3125 ]]]])
    
        output2,  tf.Tensor(
        [[[[ 96.46875  96.46875  96.46875]
           [128.625   128.625   128.625  ]
           [ 96.46875  96.46875  96.46875]]
    
          [[128.625   128.625   128.625  ]
           [171.5     171.5     171.5    ]
           [128.625   128.625   128.625  ]]
    
          [[ 64.3125   64.3125   64.3125 ]
           [ 85.75     85.75     85.75   ]
           [ 64.3125   64.3125   64.3125 ]]]], shape=(1, 3, 3, 3), dtype=float64)
        Sum of a - b is  0.0
    

    ONNX计算公式

    onnx的接口,参考IR定义如下

    std::function<void(OpSchema&)> ConvOpSchemaGenerator(const char* filter_desc) {
      return [=](OpSchema& schema) {
        std::string doc = R"DOC(
    The convolution operator consumes an input tensor and {filter_desc}, and
    computes the output.)DOC";
        ReplaceAll(doc, "{filter_desc}", filter_desc);
        schema.SetDoc(doc);
        schema.Input(
            0,
            "X",
            "Input data tensor from previous layer; "
            "has size (N x C x H x W), where N is the batch size, "
            "C is the number of channels, and H and W are the "
            "height and width. Note that this is for the 2D image. "
            "Otherwise the size is (N x C x D1 x D2 ... x Dn). "
            "Optionally, if dimension denotation is "
            "in effect, the operation expects input data tensor "
            "to arrive with the dimension denotation of [DATA_BATCH, "
            "DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].",
            "T");
        schema.Input(
            1,
            "W",
            "The weight tensor that will be used in the "
            "convolutions; has size (M x C/group x kH x kW), where C "
            "is the number of channels, and kH and kW are the "
            "height and width of the kernel, and M is the number "
            "of feature maps. For more than 2 dimensions, the "
            "kernel shape will be (M x C/group x k1 x k2 x ... x kn), "
            "where (k1 x k2 x ... kn) is the dimension of the kernel. "
            "Optionally, if dimension denotation is in effect, "
            "the operation expects the weight tensor to arrive "
            "with the dimension denotation of [FILTER_OUT_CHANNEL, "
            "FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL ...]. "
            "X.shape[1] == (W.shape[1] * group) == C "
            "(assuming zero based indices for the shape array). "
            "Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL. ",
            "T");
        schema.Input(
            2,
            "B",
            "Optional 1D bias to be added to the convolution, has size of M.",
            "T",
            OpSchema::Optional);
        schema.Output(
            0,
            "Y",
            "Output data tensor that contains the result of the "
            "convolution. The output dimensions are functions "
            "of the kernel size, stride size, and pad lengths.",
            "T");
        schema.TypeConstraint(
            "T",
            {"tensor(float16)", "tensor(float)", "tensor(double)"},
            "Constrain input and output types to float tensors.");
        schema.Attr(
            "kernel_shape",
            "The shape of the convolution kernel. If not present, should be inferred from input W.",
            AttributeProto::INTS,
            OPTIONAL);
        schema.Attr(
            "dilations",
            "dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.",
            AttributeProto::INTS,
            OPTIONAL);
        schema.Attr(
            "strides",
            "Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.",
            AttributeProto::INTS,
            OPTIONAL);
        schema.Attr(
            "auto_pad",
            auto_pad_doc,
            AttributeProto::STRING,
            std::string("NOTSET"));
        schema.Attr(
            "pads",
            pads_doc,
            AttributeProto::INTS,
            OPTIONAL);
        schema.Attr(
            "group",
            "number of groups input channels and output channels are divided into.",
            AttributeProto::INT,
            static_cast<int64_t>(1));
        schema.TypeAndShapeInferenceFunction([](InferenceContext& ctx) {
          propagateElemTypeFromInputToOutput(ctx, 0, 0);
          convPoolShapeInference(ctx, true, false, 0, 1);
        });
      };
    }
    

    需要注意上面的auto_pad选项,与tensorflow类似有3个选项

    • NOTSET,同tensorflow的VALID

    • SAME_UPPER 或者SAME_LOWER,这里的内容可以参考onnx文件defs.cc中的代码

      std::vector<int64_t> pads;
        if (getRepeatedAttribute(ctx, "pads", pads)) {
          if (pads.size() != n_input_dims * 2) {
            fail_shape_inference("Attribute pads has incorrect size");
          }
        } else {
          pads.assign(n_input_dims * 2, 0); // pads的size是输入维度的2倍
          const auto* auto_pad_attr = ctx.getAttribute("auto_pad");
          if ((nullptr != auto_pad_attr) && (auto_pad_attr->s() != "VALID")) { // 如果pad mode是SAME_UPPER或者SAME_LOWER则进入该分支
            int input_dims_size = static_cast<int>(n_input_dims);
            for (int i = 0; i < input_dims_size; ++i) { // 计算每个axis的pads
              int64_t residual = 0;
              int64_t stride = strides[i];
              if (stride > 1) { // 如果stride == 1,那么total_pad就是Wk - Stride = Wk - 1
                if (!input_shape.dim(2 + i).has_dim_value()) {
                  continue;
                }
                residual = input_shape.dim(2 + i).dim_value();
                while (residual >= stride) {
                  residual -= stride;
                }
              }
              int64_t total_pad = residual == 0 ? effective_kernel_shape[i] - stride : effective_kernel_shape[i] - residual; // effective_kernel_shape = Wk
              if (total_pad < 0)
                total_pad = 0;
              int64_t half_pad_small = total_pad >> 1;
              int64_t half_pad_big = total_pad - half_pad_small;
              if (auto_pad_attr->s() == "SAME_UPPER") {           // pad mode is here
                pads[i] = half_pad_small;
                pads[i + input_dims_size] = half_pad_big;
              } else if (auto_pad_attr->s() == "SAME_LOWER") {
                pads[i] = half_pad_big;
                pads[i + input_dims_size] = half_pad_small;
              }
            }
          }
        }
      

      上面的代码里面最难理解14~23行,其实这里计算total_pad就是tensorflow中的(P_a),以上面的公式,做更进一步的推导,

      [egin{equation} egin{aligned} P_{a}&=left(W_{o}-1 ight) S+W_{k}-W_{i}\ &=(leftlceilfrac{W_{i}}{S} ight ceil - 1)S+W_{k}-W_{i}\ end{aligned} end{equation} ]

      下面的分析有两种情况,对应代码第23行,

      • 如果(W_i)(S)的整数倍,那么(W_i = nS),带入上面的公式有(P_a = W_k - S)
      • 如果(W_i)不是(S)的整数倍,那么(W_i = nS+m, m gt 0),带入上面的公式有(P_a = W_k - m),这个(m)就是(W_i)被Stride相除之后的余数,即代码中的residual

      SAME_UPPERSAME_LOWER对应(P_a)是奇数的情况,如果是偶数,结果一样,如果是奇数,那么SAME_UPPER放小半部分(lfloor{P_a/2} floor)SAME_LOWER放大半部分(P_a - lfloor{P_a/2} floor)

    举例

    python - What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow? - Stack Overflow上有一个比较具体的例子,可以看到,使用SAME模式可以使得stride刚好完整取完所有的input而不会有剩余或者短缺。

    • VALID 模式

      inputs:         1  2  3  4  5  6  7  8  9  10 11 (12 13)
                      |________________|                dropped
                                     |_________________|
      
    • SAME模式

                     pad|                                      |pad
         inputs:      0 |1  2  3  4  5  6  7  8  9  10 11 12 13|0  0
                     |________________|
                                    |_________________|
                                                   |________________|
      
      

      在这个例子中(W_i = 13, W_k = 6, S = 5)

    参考资料

    
    
  • 相关阅读:
    让我用69406条评论告诉你“反贪风暴”好不好看!!!
    【大数据】爬取全部的校园新闻
    【大数据】获取一篇新闻的全部信息
    【大数据】理解爬虫原理
    中文统计
    [大数据]统计词频
    数据库表设计以及表字段命名
    设计模式的理论理解
    文件上传之oss服务器上传文件简笔
    QueryWrapper/UpdateWrapper认识
  • 原文地址:https://www.cnblogs.com/bugxch/p/14190955.html
Copyright © 2011-2022 走看看