zoukankan      html  css  js  c++  java
  • 第四周作业:卷积神经网络(Part2)

    一、理论学习

    1、AlexNet的主要改进:在全连接层后增加丢弃法Dropout、使用ReLU减缓梯度消失、使用MaxPooling

      

      上下两层是两个GPU的结果,输入为227*227*3的图片(由224*224*3调整得来),第一层卷积的卷积核数量为96,卷积核大小为11*11*3,步长是4,padding=0,卷积后得到的图像大小feature map size为55*55*96,由于采用双CPU处理,所以数据分为两组,分别是55*55*48

      池化运算里步长为2,卷积核大小为3*3,计算过程是(55-3)/2+1=27,池化后的图像尺寸为27*27*96

      第二层卷积输入是27*27*96的像素层,步长为1,卷积核大小为5*5*256,padding=2,feature map 为27*27*256

      池化运算里步长为2,卷积核大小为3*3,计算过程是(27-3)/2+1=13,feature map 为13*13*256

      第三层卷积输入是13*13*128的像素层,步长为1,卷积核大小为3*3*384,padding=1,feature map 为13*13*384

      第四层卷积输入是13*13*192的像素层,卷积核大小为3*3*384,padding=1,feature map 为13*13*384

      第五层卷积输入是13*13*192的像素层,步长为1,卷积核大小为3*3*256,padding=1,feature map 为13*13*256

      池化运算里步长为2,卷积核大小为3*3,计算过程为(13-3)/2+1=6,feature map 为6*6*256

      第六层第七层是全连接层,神经元个数为4096,第八层全连接层神经元个数为1000,对应于ImageNet的1000类。

     1 net = nn.Sequential(
     2     nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
     3     nn.MaxPool2d(kernel_size=3, stride=2),
     4     nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
     5     nn.MaxPool2d(kernel_size=3, stride=2),
     6     nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
     7     nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
     8     nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
     9     nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(),
    10     nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5),
    11     nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5),
    12     nn.Linear(4096, 10))

      除此之外,AlexNet还应用了“局部响应归一化层”,形象来说对于13×13×256,由于不需要太多高激活神经元,LRN要选取一个位置,从这个位置穿过整个通道得到256个数字并进行归一化。但是LRN的效果并不理想。

    2、VGG:更大更深的AlexNet,使用可重复使用的卷积块构建网络。网络非常庞大,需要训练的特征数量也很大。随着网络的加深,图像的高度和宽度都在以一定的规律不断缩小,每次池化后刚好缩小一半,而通道数量在不断增加,而且刚好也是在每组卷积操作后增加一倍。也就是说,图像缩小的比例和通道数增加的比例是有规律的。

     1 #vgg块
     2 def vgg_block(num_convs, in_channels, out_channels):
     3     layers = []
     4     for _ in range(num_convs):
     5         layers.append(
     6             nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
     7         layers.append(nn.ReLU())
     8         in_channels = out_channels
     9     layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
    10     return nn.Sequential(*layers)
    11 
    12 #vgg网络
    13 conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
    14 
    15 def vgg(conv_arch):
    16     conv_blks = []
    17     in_channels = 1
    18     for (num_convs, out_channels) in conv_arch:
    19         conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
    20         in_channels = out_channels
    21 
    22     return nn.Sequential(*conv_blks, nn.Flatten(),
    23                          nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(),
    24                          nn.Dropout(0.5), nn.Linear(4096, 4096), nn.ReLU(),
    25                          nn.Dropout(0.5), nn.Linear(4096, 10))

    3、NiN网络:由于全连接层参数多,会产生过拟合

      无全连接层(NiN块里每个卷积后面两个全连接层)、交替使用NiN块和步长为2的最大池化层、最后使用全局平局池化得到输出。

     1 def nin_block(in_channels, out_channels, kernel_size, strides, padding):
     2     return nn.Sequential(
     3         nn.Conv2d(in_channels, out_channels, kernel_size, strides, padding),
     4         nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1),
     5         nn.ReLU(), nn.Conv2d(out_channels, out_channels, kernel_size=1),
     6         nn.ReLU())
     7     
     8 net = nn.Sequential(
     9     nin_block(1, 96, kernel_size=11, strides=4, padding=0),
    10     nn.MaxPool2d(3, stride=2),
    11     nin_block(96, 256, kernel_size=5, strides=1, padding=2),
    12     nn.MaxPool2d(3, stride=2),
    13     nin_block(256, 384, kernel_size=3, strides=1, padding=1),
    14     nn.MaxPool2d(3, stride=2), nn.Dropout(0.5),
    15     nin_block(384, 10, kernel_size=3, strides=1, padding=1),
    16     nn.AdaptiveAvgPool2d((1, 1)),
    17     nn.Flatten())

    4、GoogLeNet

      Inception块:4个路径从不同层面抽取信息,在输出通道维进行合并。输入被copy成4块,第一块进行1x1卷积,第二块先进行1x1卷积再进行3x3卷积(pad=1),第三块先进行1x1卷积再进行5x5卷积(pad=2),第四块先进行3x3的最大池化(pad=1)再进行1x1卷积。采用不同大小的卷积核意味着不同大小的感受野,最后拼接意味着不同尺度特征的融合。四块的总体特征是输出跟输入高宽不变,即等高等宽,但是通道数改变。GoogleNet使用了9个Inception块,最后通道数不断增大变为1024。

      

     1 class Inception(nn.Module):
     2     def __init__(self, in_channels, c1, c2, c3, c4, **kwargs):
     3         super(Inception, self).__init__(**kwargs)
     4         self.p1_1 = nn.Conv2d(in_channels, c1, kernel_size=1)
     5         self.p2_1 = nn.Conv2d(in_channels, c2[0], kernel_size=1)
     6         self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
     7         self.p3_1 = nn.Conv2d(in_channels, c3[0], kernel_size=1)
     8         self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
     9         self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
    10         self.p4_2 = nn.Conv2d(in_channels, c4, kernel_size=1)
    11 
    12     def forward(self, x):
    13         p1 = F.relu(self.p1_1(x))
    14         p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
    15         p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
    16         p4 = F.relu(self.p4_2(self.p4_1(x)))
    17         return torch.cat((p1, p2, p3, p4), dim=1)
    18 
    19 b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    20                    nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2,
    21                                            padding=1))
    22 
    23 b2 = nn.Sequential(nn.Conv2d(64, 64, kernel_size=1), nn.ReLU(),
    24                    nn.Conv2d(64, 192, kernel_size=3, padding=1),
    25                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    26 
    27 b3 = nn.Sequential(Inception(192, 64, (96, 128), (16, 32), 32),
    28                    Inception(256, 128, (128, 192), (32, 96), 64),
    29                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    30 
    31 b4 = nn.Sequential(Inception(480, 192, (96, 208), (16, 48), 64),
    32                    Inception(512, 160, (112, 224), (24, 64), 64),
    33                    Inception(512, 128, (128, 256), (24, 64), 64),
    34                    Inception(512, 112, (144, 288), (32, 64), 64),
    35                    Inception(528, 256, (160, 320), (32, 128), 128),
    36                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    37 
    38 b5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),
    39                    Inception(832, 384, (192, 384), (48, 128), 128),
    40                    nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten())
    41 
    42 net = nn.Sequential(b1, b2, b3, b4, b5, nn.Linear(1024, 10))

    5、批量归一化:底部数据梯度小,训练较慢,后面的层梯度较大,底层变化所有层要跟着变,导致收敛变慢。对全连接层,作用在特征维;对卷积层,作用在通道维。批量归一化可以放在输出激活函数前,也可以作用在输入上。通常使用nn.batchnorm函数完成批量归一化。

    6、ResNet:串联一个残差块加入到右边得到f(x)=x+g(x),使得很深的网络容易训练。随着网络深度增加,梯度逐渐消失,因此引入了残差单元,将前面一层的输出直接连到后面的第二层,进而抑制了退化问题。ResNet的层数很深,借鉴了Highway Network思想,使得原来的拟合输出F(x)变成输出和输入的差F(x)-x,其中F(X)是某一层原始的的期望映射输出,x是输入。

      

     1 class Residual(nn.Module):  
     2     def __init__(self, input_channels, num_channels, use_1x1conv=False,
     3                  strides=1):
     4         super().__init__()
     5         self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3,
     6                                padding=1, stride=strides)
     7         self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3,
     8                                padding=1)
     9         if use_1x1conv:
    10             self.conv3 = nn.Conv2d(input_channels, num_channels,
    11                                    kernel_size=1, stride=strides)
    12         else:
    13             self.conv3 = None
    14         self.bn1 = nn.BatchNorm2d(num_channels)
    15         self.bn2 = nn.BatchNorm2d(num_channels)
    16         self.relu = nn.ReLU(inplace=True)
    17 
    18     def forward(self, X):
    19         Y = F.relu(self.bn1(self.conv1(X)))
    20         Y = self.bn2(self.conv2(Y))
    21         if self.conv3:
    22             X = self.conv3(X)
    23         Y += X
    24         return F.relu(Y)
    25 
    26 b1 = nn.Sequential(nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
    27                    nn.BatchNorm2d(64), nn.ReLU(),
    28                    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
    29 
    30 def resnet_block(input_channels, num_channels, num_residuals,
    31                  first_block=False):
    32     blk = []
    33     for i in range(num_residuals):
    34         if i == 0 and not first_block:
    35             blk.append(
    36                 Residual(input_channels, num_channels, use_1x1conv=True,
    37                          strides=2))
    38         else:
    39             blk.append(Residual(num_channels, num_channels))
    40     return blk
    41 
    42 b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
    43 b3 = nn.Sequential(*resnet_block(64, 128, 2))
    44 b4 = nn.Sequential(*resnet_block(128, 256, 2))
    45 b5 = nn.Sequential(*resnet_block(256, 512, 2))
    46 
    47 net = nn.Sequential(b1, b2, b3, b4, b5, nn.AdaptiveAvgPool2d((1, 1)),
    48                     nn.Flatten(), nn.Linear(512, 10))

    二、使用ResNet完成猫狗大战

     1、首先是使用李老师的resnet网络跑的猫狗大战,可以发现网络很不稳定,loss时大时小,得到的分类结果不如Alexnet。

      

     2、使用models自带的resnet50跑猫狗大战,准确率还是不高,我都惊了~

    主要是晓晨大佬看出了net.eval(),应用了net.eval()后准确率直线上升,非常nice。

     1 net = models.resnet50(pretrained=True)
     2 for param in net.parameters():
     3     param.requirse_grad = False
     4 net.fc = nn.Linear(2048, 2, bias=True) # 二分类
     5 
     6 print(net)
     7 
     8 # 网络放到GPU上
     9 net = net.to(device)
    10 criterion = nn.CrossEntropyLoss()
    11 optimizer = optim.Adam(net.fc.parameters(), lr=0.001)
    12 for epoch in range(1):  # 重复多轮训练
    13     net.train()
    14     for i, (inputs, labels) in enumerate(train_loader):
    15         inputs = inputs.to(device)
    16         labels = labels.to(device)
    17         # 优化器梯度归零
    18         optimizer.zero_grad()
    19         # 正向传播 + 反向传播 + 优化 
    20         outputs = net(inputs)
    21         loss = criterion(outputs, labels.long())
    22         loss.backward()
    23         optimizer.step() 
    24     print('Epoch: %d loss: %.6f' %(epoch + 1, loss.item()))
    25 print('Finished Training')

     

     3、看到浩鹏对二分类的处理就很细致,2048直接到2估计会损失不少信息,再试着温柔的下降试试看~

  • 相关阅读:
    linux计划任务格式
    KVO监听数组的变化
    经典题目:输入半径求圆的面积
    /src/applicationContext.xml
    词法分析错题
    正规式与有限自动机
    词法分析程序的设计
    词法分析概述
    windows平台上 搭建 VisualSVN服务器 和 TortoiseSVN客户端
    maven error Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single
  • 原文地址:https://www.cnblogs.com/sun-or-moon/p/15335964.html
Copyright © 2011-2022 走看看