记录一下感受野的理解:
在神经网络中,感受野的定义是:
神经网络的每一层输出的特征图(Feature ap)上的像素点在原图像上映射的区域大小。
1. 神经网络中,第一个卷积层的 感受野大小,就等于filter,滤波器的大小。
2. 深层卷积层的感受野大小和它之前所有层的滤波器大小和步长有关系。
3.计算感受野大小时,忽略了图像边缘的影响,即不考虑padding的大小。
首先说strides = 之前的神经网络层的步长乘积,也就是:strides(i) = stride(1) * stride(2) * ...* stride(i-1)
感受野的计算,是从(最深层-1)的神经网络,迭代到第一层,来计算的, 公式简单表达为:
RF{i} = (RF{i-1} - 1) * stride + ConvSize
RCNN论文中有一段描述,Alexnet网络pool5输出的特征图上的像素在输入图像上有很大的感受野(have very large receptive fields (195 × 195 pixels))和步长(strides (32×32 pixels) ), 这两个变量的数值是如何得出的呢?
用python代码表达:
#!/usr/bin/env python net_struct = {'alexnet': {'net':[[11,4,0],[3,2,0],[5,1,2],[3,2,0],[3,1,1],[3,1,1],[3,1,1],[3,2,0]], 'name':['conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5']}, 'vgg16': {'net':[[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[3,1,1], [2,2,0],[3,1,1],[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[3,1,1],[2,2,0]], 'name':['conv1_1','conv1_2','pool1','conv2_1','conv2_2','pool2','conv3_1','conv3_2', 'conv3_3', 'pool3','conv4_1','conv4_2','conv4_3','pool4','conv5_1','conv5_2','conv5_3','pool5']}, 'zf-5':{'net': [[7,2,3],[3,2,1],[5,2,2],[3,2,1],[3,1,1],[3,1,1],[3,1,1]], 'name': ['conv1','pool1','conv2','pool2','conv3','conv4','conv5']}} imsize = 224 def outFromIn(isz, net, layernum): totstride = 1 insize = isz for layer in range(layernum): fsize, stride, pad = net[layer] outsize = (insize - fsize + 2*pad) / stride + 1 insize = outsize totstride = totstride * stride return outsize, totstride def inFromOut(net, layernum): RF = 1 for layer in reversed(range(layernum)): fsize, stride, pad = net[layer] RF = ((RF -1)* stride) + fsize return RF if __name__ == '__main__': print "layer output sizes given image = %dx%d" % (imsize, imsize) for net in net_struct.keys(): print '************net structrue name is %s**************'% net for i in range(len(net_struct[net]['net'])): p = outFromIn(imsize,net_struct[net]['net'], i+1) rf = inFromOut(net_struct[net]['net'], i+1) print "Layer Name = %s, Output size = %3d, Stride = % 3d, RF size = %3d" % (net_struct[net]['name'][i], p[0], p[1], rf)