zoukankan      html  css  js  c++  java
  • tf.nn.conv2d和tf.contrib.slim.conv2d的区别

    来源:http://blog.sina.com.cn/s/blog_6ca0f5eb0102wsuu.html

    在查看代码的时候,看到有代码用到卷积层是tf.nn.conv2d,但是也有的使用的卷积层是tf.contrib.slim.conv2d,这两个函数调用的卷积层是否一致,在查看了API的文档,以及slim.conv2d的源码后,做如下总结:

    首先是常见使用的tf.nn.conv2d的函数,其定义如下:

    conv2d(

        input,

        filter,

        strides,

        padding,

        use_cudnn_on_gpu=None,

        data_format=None,

        name=None

    )

    input指需要做卷积的输入图像,它要求是一个Tensor,具有[batch_size, in_height, in_width, in_channels]这样的shape,具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],注意这是一个4维的Tensor,要求数据类型为float32和float64其中之一

    filter用于指定CNN中的卷积核,它要求是一个Tensor,具有[filter_height, filter_width, in_channels, out_channels]这样的shape,具体含义是[卷积核的高度,卷积核的宽度,图像通道数,卷积核个数],要求类型与参数input相同,有一个地方需要注意,第三维in_channels,就是参数input的第四维,这里是维度一致,不是数值一致

    strides为卷积时在图像每一维的步长,这是一个一维的向量,长度为4,对应的是在input的4个维度上的步长

    paddingstring类型的变量,只能是"SAME","VALID"其中之一,这个值决定了不同的卷积方式,SAME代表卷积核可以停留图像边缘,VALID表示不能,更详细的描述可以参考http://blog.csdn.net/mao_xiao_feng/article/details/53444333

    use_cudnn_on_gpu指定是否使用cudnn加速,默认为true

    data_format是用于指定输入的input的格式,默认为NHWC格式

     

    结果返回一个Tensor,这个输出,就是我们常说的feature map

     

    而对于tf.contrib.slim.conv2d,其函数定义如下:

    convolution(inputs,

              num_outputs,

              kernel_size,

              stride=1,

              padding='SAME',

              data_format=None,

              rate=1,

              activation_fn=nn.relu,

              normalizer_fn=None,

              normalizer_params=None,

              weights_initializer=initializers.xavier_initializer(),

              weights_regularizer=None,

              biases_initializer=init_ops.zeros_initializer(),

              biases_regularizer=None,

              reuse=None,

              variables_collections=None,

              outputs_collections=None,

              trainable=True,

              scope=None):

    inputs同样是指需要做卷积的输入图像

    num_outputs指定卷积核的个数(就是filter的个数)

    kernel_size用于指定卷积核的维度(卷积核的宽度,卷积核的高度)

    stride为卷积时在图像每一维的步长

    padding为padding的方式选择,VALID或者SAME

    data_format是用于指定输入的input的格式

    rate这个参数不是太理解,而且tf.nn.conv2d中也没有,对于使用atrous convolution的膨胀率(不是太懂这个atrous convolution)

    activation_fn用于激活函数的指定,默认的为ReLU函数

    normalizer_fn用于指定正则化函数

    normalizer_params用于指定正则化函数的参数

    weights_initializer用于指定权重的初始化程序

    weights_regularizer为权重可选的正则化程序

    biases_initializer用于指定biase的初始化程序

    biases_regularizer: biases可选的正则化程序

    reuse指定是否共享层或者和变量

    variable_collections指定所有变量的集合列表或者字典

    outputs_collections指定输出被添加的集合

    trainable:卷积层的参数是否可被训练

    scope:共享变量所指的variable_scope

     

    在上述的API中,可以看出去除掉初始化的部分,那么两者并没有什么不同,只是tf.contrib.slim.conv2d提供了更多可以指定的初始化的部分,而对于tf.nn.conv2d而言,其指定filter的方式相比较tf.contrib.slim.conv2d来说,更加的复杂。去除掉少用的初始化部分,其实两者的API可以简化如下:

    tf.contrib.slim.conv2d (inputs,

                    num_outputs,[卷积核个数]

                    kernel_size,[卷积核的高度,卷积核的宽度]

                    stride=1,

                    padding='SAME',

    )

    tf.nn.conv2d(

        input,(与上述一致)

        filter,([卷积核的高度,卷积核的宽度,图像通道数,卷积核个数])

        strides,

        padding,

    )

    可以说两者是几乎相同的,运行下列代码也可知这两者一致

    import tensorflow as tf 

    import tensorflow.contrib.slim as slim

     

    x1 = tf.ones(shape=[1, 64, 64, 3]) 

    w = tf.fill([5, 5, 3, 64], 1)

    # print("rank is", tf.rank(x1))

    y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')

    y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')

     

     

    with tf.Session() as sess: 

        sess.run(tf.global_variables_initializer()) 

        y1_value,y2_value,x1_value=sess.run([y1,y2,x1])

        print("shapes are", y1_value.shape, y2_value.shape)

        print(y1_value==y2_value)

        print(y1_value)

    print(y2_value)

     

    最后配上tf.contrib.slim.conv2d的API英文版

    def convolution(inputs,

                    num_outputs,

                    kernel_size,

                    stride=1,

                    padding='SAME',

                    data_format=None,

                    rate=1,

                    activation_fn=nn.relu,

                    normalizer_fn=None,

                    normalizer_params=None,

                    weights_initializer=initializers.xavier_initializer(),

                    weights_regularizer=None,

                    biases_initializer=init_ops.zeros_initializer(),

                    biases_regularizer=None,

                    reuse=None,

                    variables_collections=None,

                    outputs_collections=None,

                    trainable=True,

                    scope=None):

      """Adds an N-D convolution followed by an optional batch_norm layer.

      It is required that 1 <= N <= 3.

      `convolution` creates a variable called `weights`, representing the

      convolutional kernel, that is convolved (actually cross-correlated) with the

      `inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is

      provided (such as `batch_norm`), it is then applied. Otherwise, if

      `normalizer_fn` is None and a `biases_initializer` is provided then a `biases`

      variable would be created and added the activations. Finally, if

      `activation_fn` is not `None`, it is applied to the activations as well.

      Performs atrous convolution with input stride/dilation rate equal to `rate`

      if a value > 1 for any dimension of `rate` is specified.  In this case

      `stride` values != 1 are not supported.

      Args:

        inputs: A Tensor of rank N+2 of shape

          `[batch_size] + input_spatial_shape + [in_channels]` if data_format does

          not start with "NC" (default), or

          `[batch_size, in_channels] + input_spatial_shape` if data_format starts

          with "NC".

        num_outputs: Integer, the number of output filters.

        kernel_size: A sequence of N positive integers specifying the spatial

          dimensions of the filters.  Can be a single integer to specify the same

          value for all spatial dimensions.

        stride: A sequence of N positive integers specifying the stride at which to

          compute output.  Can be a single integer to specify the same value for all

          spatial dimensions.  Specifying any `stride` value != 1 is incompatible

          with specifying any `rate` value != 1.

        padding: One of `"VALID"` or `"SAME"`.

        data_format: A string or None.  Specifies whether the channel dimension of

          the `input` and output is the last dimension (default, or if `data_format`

          does not start with "NC"), or the second dimension (if `data_format`

          starts with "NC").  For N=1, the valid values are "NWC" (default) and

          "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".

          For N=3, the valid values are "NDHWC" (default) and "NCDHW".

        rate: A sequence of N positive integers specifying the dilation rate to use

          for atrous convolution.  Can be a single integer to specify the same

          value for all spatial dimensions.  Specifying any `rate` value != 1 is

          incompatible with specifying any `stride` value != 1.

        activation_fn: Activation function. The default value is a ReLU function.

          Explicitly set it to None to skip it and maintain a linear activation.

        normalizer_fn: Normalization function to use instead of `biases`. If

          `normalizer_fn` is provided then `biases_initializer` and

          `biases_regularizer` are ignored and `biases` are not created nor added.

          default set to None for no normalizer function

        normalizer_params: Normalization function parameters.

        weights_initializer: An initializer for the weights.

        weights_regularizer: Optional regularizer for the weights.

        biases_initializer: An initializer for the biases. If None skip biases.

        biases_regularizer: Optional regularizer for the biases.

        reuse: Whether or not the layer and its variables should be reused. To be

          able to reuse the layer scope must be given.

        variables_collections: Optional list of collections for all the variables or

          a dictionary containing a different list of collection per variable.

        outputs_collections: Collection to add the outputs.

        trainable: If `True` also add variables to the graph collection

          `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).

        scope: Optional scope for `variable_scope`.

      Returns:

        A tensor representing the output of the operation.

      Raises:

        ValueError: If `data_format` is invalid.

        ValueError: Both 'rate' and `stride` are not uniformly 1.

  • 相关阅读:
    团队项目——地铁信息查询数据结构设计
    用Visio画UML顺序图
    用Visio画UML用例图
    结对编程项目进展——第四周
    结对编程项目进展——第三周
    结对项目进展第一周——初步认识结对编程
    个人项目四则运算生成程序进展——第三周
    hbase各种遍历查询shell语句 包含过滤组合条件
    linux c段错误分析方法
    SMB SMB2 协议wiki
  • 原文地址:https://www.cnblogs.com/fujian-code/p/9596883.html
Copyright © 2011-2022 走看看