经典卷积神经网络
1.LeNet
卷积层块里的基本单位是卷积层后接平均池化层:卷积层用来识别图像里的空间模式,如线条和物体局部,之后的平均池化层则用来降低卷积层对位置的敏感性。卷积层块由两个这样的基本单位重复堆叠构成。在卷积层块中,每个卷积层都使用5×5的窗口,并在输出上使用sigmoid激活函数。第一个卷积层输出通道数为6,第二个卷积层输出通道数则增加到16。全连接层块含3个全连接层。它们的输出个数分别是120、84和10,其中10为输出的类别个数。
class LeNet(nn.Module):
def __init__(self, *, channels, fig_size, num_class):
super(LeNet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(channels, 6, 5, padding=2),
nn.Sigmoid(),
nn.AvgPool2d(2, 2),
nn.Conv2d(6, 16, 5),
nn.Sigmoid(),
nn.AvgPool2d(2, 2),
)
##经过卷积和池化层后的图像大小
fig_size = (fig_size - 5 + 1 + 4 ) // 1
fig_size = (fig_size - 2 + 2) // 2
fig_size = (fig_size - 5 + 1) // 1
fig_size = (fig_size - 2 + 2) // 2
self.fc = nn.Sequential(
nn.Flatten(),
nn.Linear(16 * fig_size * fig_size, 120),
nn.Sigmoid(),
nn.Linear(120, 84),
nn.Sigmoid(),
nn.Linear(84, num_class),
)
def forward(self, X):
conv_features = self.conv(X)
output = self.fc(conv_features)
return output
2.AlexNet
首次证明了学习到的特征可以超越⼿⼯设计的特征,从而⼀举打破计算机视觉研究的前状。
特征:
- 8层变换,其中有5层卷积和2层全连接隐藏层,以及1个全连接输出层。
- 将sigmoid激活函数改成了更加简单的ReLU激活函数。
- 用Dropout来控制全连接层的模型复杂度。
- 引入数据增强,如翻转、裁剪和颜色变化,从而进一步扩大数据集来缓解过拟合。
class AlexNet(nn.Module):
def __init__(self,*, channels, fig_size, num_class):
super(AlexNet, self).__init__()
self.dropout = 0.5
self.conv = nn.Sequential(
nn.Conv2d(channels, 96, 11, 4),
nn.ReLU(),
nn.MaxPool2d(3, 2),
nn.Conv2d(96, 256, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(3, 2),
nn.Conv2d(256, 384, 3, 1, 1),
nn.ReLU(),
nn.Conv2d(384, 384, 3, 1, 1),
nn.ReLU(),
nn.Conv2d(384, 256, 3, 1, 1),
nn.ReLU(),
nn.MaxPool2d(3, 2),
)
##经过卷积和池化层后的图像大小
fig_size = (fig_size - 11 + 4) // 4
fig_size = (fig_size - 3 + 2) // 2
fig_size = (fig_size - 5 + 1 + 4) // 1
fig_size = (fig_size - 3 + 2) // 2
fig_size = (fig_size - 3 + 1 + 2) // 1
fig_size = (fig_size - 3 + 1 + 2) // 1
fig_size = (fig_size - 3 + 1 + 2) // 1
fig_size = (fig_size - 3 + 2) // 2
self.fc = nn.Sequential(
nn.Linear(256 * fig_size * fig_size, 4096),
nn.ReLU(),
nn.Dropout(p = self.dropout),
nn.Linear(4096, 4096),
nn.ReLU(),
nn.Dropout(p = self.dropout),
nn.Linear(4096, num_class),
)
def forward(self, X):
conv_features = self.conv(X)
output = self.fc(conv_features.view(X.shape[0], -1))
return output
3.Vgg
VGG:通过重复使⽤简单的基础块来构建深度模型。
Block:数个相同的填充为1、窗口形状为(3 imes 3)的卷积层,接上一个步幅为2、窗口形状为(2 imes 2)的最大池化层。卷积层保持输入的高和宽不变,而池化层则对其减半。
class VggBlock(nn.Module):
def __init__(self, conv_arch):
super(VggBlock, self).__init__()
num_convs, in_channels, out_channels = conv_arch
self.conv = nn.Sequential()
for i in range(num_convs):
self.conv.add_module(f'conv_{i+1}', nn.Conv2d(in_channels, out_channels, 3, padding=1))
in_channels = out_channels
self.conv.add_module('maxpool', nn.MaxPool2d(2, 2))
def forward(self, X):
return self.conv(X)
class Vgg11(nn.Module):
def __init__(self, *, channels, fig_size, num_class):
super(Vgg11, self).__init__()
self.dropout = 0.5
self.conv_arch = [(1, channels, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512)]
self.fc_neuros = 4096
self.vgg_blocks = nn.Sequential()
for i, conv_arch in enumerate(self.conv_arch):
self.vgg_blocks.add_module(f'vbb_block{i+1}', VggBlock(conv_arch))
fig_size = fig_size // (2 ** len(self.conv_arch))
fc_features = self.conv_arch[-1][-1] * fig_size * fig_size
self.fc = nn.Sequential(
nn.Flatten(),
nn.Linear(fc_features, self.fc_neuros),
nn.ReLU(),
nn.Dropout(p = self.dropout),
nn.Linear(self.fc_neuros, self.fc_neuros),
nn.ReLU(),
nn.Dropout(p = self.dropout),
nn.Linear(self.fc_neuros, num_class),
)
def forward(self, X):
conv_features = self.vgg_blocks(X)
output = self.fc(conv_features)
return output
4.Nin
1×1卷积核作用
1.放缩通道数:通过控制卷积核的数量达到通道数的放缩。
2.增加非线性。1×1卷积核的卷积过程相当于全连接层的计算过程,并且还加入了非线性激活函数,从而可以增加网络的非线性。
3.计算参数少
class NinBlock(nn.Module):
def __init__(self, conv_arch):
# conv_arch : (in_channels, out_channels, kernel_size, stride, padding)
super(NinBlock, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(*conv_arch),
nn.ReLU(),
nn.Conv2d(conv_arch[1], conv_arch[1], kernel_size=1),
nn.ReLU(),
nn.Conv2d(conv_arch[1], conv_arch[1], kernel_size=1),
nn.ReLU(),
)
def forward(self, X):
return self.conv(X)
class GlobalAvgPool2d(nn.Module):
def __init__(self):
super(GlobalAvgPool2d, self).__init__()
def forward(self, X):
return F.avg_pool2d(X, kernel_size = X.size()[2:])
class Nin(nn.Module):
def __init__(self, *, channels, fig_size, num_class):
super(Nin, self).__init__()
self.dropout = 0.5
self.conv_arch = [(channels, 96, 11, 4, 0), (96, 256, 5, 1, 2),
(256, 384, 3, 1, 1), (384, num_class, 3, 1, 1)]
self.nin_blocks = nn.Sequential()
for i, conv_arch in enumerate(self.conv_arch[:-1]):
self.nin_blocks.add_module(f'nin_block_{i+1}', NinBlock(conv_arch))
self.nin_blocks.add_module(f'max_pool_{i+1}', nn.MaxPool2d(3, 2))
self.nin_blocks.add_module('dropout', nn.Dropout(p = self.dropout))
self.nin_blocks.add_module(f'nin_block_{len(self.conv_arch)}', NinBlock(self.conv_arch[-1]))
self.global_avg_pool = GlobalAvgPool2d()
self.flatten = nn.Flatten()
def forward(self, X):
conv_features = self.nin_blocks(X)
avg_pool = self.global_avg_pool(conv_features)
return self.flatten(avg_pool)
5.GoogleNet
- 由Inception基础块组成。
- Inception块相当于⼀个有4条线路的⼦⽹络。它通过不同窗口形状的卷积层和最⼤池化层来并⾏抽取信息,并使⽤1×1卷积层减少通道数从而降低模型复杂度。
- 可以⾃定义的超参数是每个层的输出通道数,我们以此来控制模型复杂度。
class Inception(nn.Module):
def __init__(self, conv_arch):
super(Inception, self).__init__()
in_channels, c1, c2, c3, c4 = conv_arch
self.path_1 = nn.Conv2d(in_channels, c1, kernel_size = 1)
self.path_2 = nn.Sequential(
nn.Conv2d(in_channels, c2[0], kernel_size = 1),
nn.ReLU(),
nn.Conv2d(c2[0], c2[1], kernel_size = 3, padding = 1),
)
self.path_3 = nn.Sequential(
nn.Conv2d(in_channels, c3[0], kernel_size = 1),
nn.ReLU(),
nn.Conv2d(c3[0], c3[1], kernel_size = 5, padding=2),
)
self.path_4 = nn.Sequential(
nn.MaxPool2d(kernel_size = 3, stride=1, padding=1),
nn.Conv2d(in_channels, c4, kernel_size=1),
)
def forward(self, X):
p1 = F.relu(self.path_1(X))
p2 = F.relu(self.path_2(X))
p3 = F.relu(self.path_3(X))
p4 = F.relu(self.path_4(X))
return torch.cat((p1, p2, p3, p4), dim = 1)
class GoogleNet(nn.Module):
def __init__(self, *, channels, fig_size, num_class):
super(GoogleNet, self).__init__()
self.b1 = nn.Sequential(
nn.Conv2d(channels, 64, 7, 2, 3),
nn.ReLU(),
nn.MaxPool2d(3, 2, 1),
)
self.b2 = nn.Sequential(
nn.Conv2d(64, 64, 1),
nn.Conv2d(64, 192, 3, padding=1),
nn.MaxPool2d(3, 2, 1),
)
self.b3 = nn.Sequential(
Inception([192, 64, (96, 128), (16, 32), 32]),
Inception([256, 128, (128, 192), (32, 96), 64]),
nn.MaxPool2d(3, 2, 1),
)
self.b4 = nn.Sequential(
Inception([480, 192, (96, 208), (16, 48), 64]),
Inception([512, 160, (112, 224), (24, 64), 64]),
Inception([512, 128, (128, 256), (24, 64), 64]),
Inception([512, 112, (144, 288), (32, 64), 64]),
Inception([528, 256, (160, 320), (32, 128), 128]),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
self.b5 = nn.Sequential(
Inception([832, 256, (160, 320), (32, 128), 128]),
Inception([832, 384, (192, 384), (48, 128), 128]),
GlobalAvgPool2d(),
)
self.fc = nn.Sequential(
nn.Flatten(),
nn.Linear(1024, num_class),
)
self.Inception_blocks = nn.Sequential(self.b3, self.b4, self.b5)
def forward(self, X):
conv_features = self.b1(X)
conv_features = self.b2(conv_features)
incep_features = self.Inception_blocks(conv_features)
return self.fc(incep_features)
fig_size = 224
channels = 3
num_class = 10
X = torch.ones([10,channels, fig_size, fig_size])
# nin = Nin(channels = channels, fig_size = fig_size, num_class = num_class)
# output = nin(X)
# vgg11 = Vgg11(channels = channels, fig_size = fig_size, num_class = num_class)
# output = vgg11(X)
# googlenet = GoogleNet(channels = channels, fig_size = fig_size, num_class = num_class)
# output = googlenet(X)
# lenet = LeNet(fig_size=fig_size, num_class=num_class, channels=channels)
# output = lenet(X)
# alexnet = AlexNet(fig_size=fig_size, num_class=num_class,channels = channels)
# output = alexnet(X)
print(output.shape)
6.ResNet
深度学习的问题:深度CNN网络达到一定深度后再一味地增加层数并不能带来进一步地分类性能提高,反而会招致网络收敛变得更慢,准确率也变得更差。
残差块(Residual Block)
恒等映射:
左边:f(x)=x
右边:f(x)-x=0 (易于捕捉恒等映射的细微波动)
在残差块中,输⼊可通过跨层的数据线路更快 地向前传播。
ResNet模型
- 卷积(64,7x7,3)
- 批量一体化
- 最大池化(3x3,2)
- 残差块x4 (通过步幅为2的残差块在每个模块之间减小高和宽)
- 全局平均池化
- 全连接
class Residual(nn.Module):
#可以设定输出通道数、是否使用额外的1x1卷积层来修改通道数以及卷积层的步幅。
def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):
super(Residual, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
if use_1x1conv:
self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)
else:
self.conv3 = None
self.bn1 = nn.BatchNorm2d(out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
def forward(self, X):
Y = F.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
if self.conv3:
X = self.conv3(X)
return F.relu(Y + X)
class ResBlock(nn.Module):
def __init__(self, in_channels, out_channels, num_rediduals, first_block=False):
super(ResBlock, self).__init__()
if first_block:
assert in_channels == out_channels # 第一个模块的通道数同输入通道数一致
block = []
for i in range(num_rediduals):
block.append(Residual(in_channels, out_channels, use_1x1conv=not first_block, stride=2-int(first_block)))
in_channels = out_channels
self.resi_block = nn.Sequential(*block)
def forward(self, X):
return self.resi_block(X)
class ResNet(nn.Module):
def __init__(self, *, channels, fig_size, num_class):
super(ResNet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(channels, 64, 7, 2, 3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(3, 2, 1),
)
self.res_block_arch = [(64, 64, 2, True), (64, 128, 2), (128, 256, 2), (256, 512, 2)]
self.res_blocks = nn.Sequential()
for i, arch in enumerate(self.res_block_arch):
self.res_blocks.add_module(f'res_block_{i+1}', ResBlock(*arch))
self.global_avg_pool = GlobalAvgPool2d()
self.fc = nn.Sequential(
nn.Flatten(),
nn.Linear(512, num_class),
)
def forward(self, X):
conv_features = self.conv(X)
res_features = self.res_blocks(conv_features)
global_avg_pool = self.global_avg_pool(res_features)
return self.fc(global_avg_pool)
7.DenseNet
主要构建模块
稠密块(dense block): 定义了输入和输出是如何连结的。
过渡层(transition layer):用来控制通道数,使之不过大。
class DenseBlock(nn.Module):
def __init__(self, in_channels, out_channels, num_convs):
super(DenseBlock, self).__init__()
dense_block = []
for i in range(num_convs):
in_ch = in_channels + i * out_channels
dense_block.append(nn.Sequential(
nn.BatchNorm2d(in_ch),
nn.ReLU(),
nn.Conv2d(in_ch, out_channels, 3, padding=1),
))
self.dense_block = nn.ModuleList(dense_block)
self.out_channels = in_channels + num_convs * out_channels
def forward(self, X):
for block in self.dense_block:
Y = block(X)
X = torch.cat((X, Y), dim = 1)
return X
class TransBlock(nn.Module):
def __init__(self, in_channels, out_channels):
super(TransBlock, self).__init__()
self.trans_block = nn.Sequential(
nn.BatchNorm2d(in_channels),
nn.ReLU(),
nn.Conv2d(in_channels, out_channels, 1),
nn.AvgPool2d(2, 2),
)
def forward(self, X):
return self.trans_block(X)
class DenseNet(nn.Module):
def __init__(self, *, channels, fig_size, num_class):
super(DenseNet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(channels, 64, 7, 2, 3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(3, 2, 1),
)
self.dense_blocks = nn.Sequential()
self.num_convs_list = [4 for i in range(4)]
cur_channels, self.growth_rate = 64, 32
for i, num_conv in enumerate(self.num_convs_list):
dense_block = DenseBlock(cur_channels, self.growth_rate, num_conv)
self.dense_blocks.add_module(f'dense_block_{i+1}', dense_block)
cur_channels = dense_block.out_channels
if i != len(self.num_convs_list) - 1:
self.dense_blocks.add_module(f'transition_block_{i+1}', TransBlock(cur_channels, cur_channels // 2))
cur_channels //= 2
self.bn = nn.Sequential(nn.BatchNorm2d(cur_channels), nn.ReLU())
self.global_avg_pool = GlobalAvgPool2d()
self.fc = nn.Sequential(nn.Flatten(), nn.Linear(cur_channels, num_class))
def forward(self, X):
conv_features = self.conv(X)
dense_features = self.dense_blocks(conv_features)
batch_normed = self.bn(dense_features)
global_avg_pool = self.global_avg_pool(batch_normed)
return self.fc(global_avg_pool)