zoukankan      html  css  js  c++  java
  • 用tensorlayer导入Slim模型迁移学习

      上一篇博客【用tensorflow迁移学习猫狗分类】笔者讲到用tensorlayer的【VGG16模型】迁移学习图像分类,那麽问题来了,tensorlayer没提供的模型怎么办呢?别担心,tensorlayer提供了tensorflow中的【slim模型】导入功能,代码例子在tutorial_inceptionV3_tfslim
      那么什么是slim?slim到底有什么用?
    slim是一个使构建,训练,评估神经网络变得简单的库。它可以消除原生tensorflow里面很多重复的模板性的代码,让代码更紧凑,更具备可读性。另外slim提供了很多计算机视觉方面的著名模型(VGG, AlexNet等),我们不仅可以直接使用,甚至能以各种方式进行扩展。(笔者注:总之功能跟tensorlayer差不多嘛)更多介绍可以看这篇文章:【Tensorflow】辅助工具篇——tensorflow slim(TF-Slim)介绍
      要进行迁移学习,首先需要slim模型代码以及预训练好的权重参数,这些谷歌都有提供下载,可以看到主页下面有各个模型以及在imagenet训练集下的参数地址。

    列表还列出了各个模型的top1、top5的正确率,模型很多了。
      好了我们下载Inception-ResNet-v2以及inception_resnet_v2_2016_08_30.tar.gz,py文件和解压出来的.ckpt文件放到项目根目录下面。至于为什么不用tensorlayer例子提供的Inception V3?因为Inception-ResNet-v2正确率高啊。(哈哈真正原因最后来讲)。
      我们依旧进行猫狗分类,按照教程导入模型修改num_classes再导入训练数据,直接训练是会报错的,因为最后的Logits层几个参数在恢复时维度不匹配。
    最后几个参数是不能恢复了,笔者也没有找到选择性恢复.ckpt参数的tensorflow方法。怎么办呢?幸好群里面有位朋友提供了一个方法,参见【Tensorflow 迁移学习】:

    主要思想是:先把所有.ckpt参数恢复成npz格式,再选择恢复npz中的参数,恢复npz中的参数就跟前一篇博客操作一样的了。
    所以整个过程分两步走:
    1.将参数恢复然后保存为npz格式:
      下面是具体代码:

    import os
    import time
    from recordutil import *
    import numpy as np
    # from tensorflow.contrib.slim.python.slim.nets.resnet_v2 import resnet_v2_152
    # from tensorflow.contrib.slim.python.slim.nets.vgg import vgg_16
    import skimage
    import skimage.io
    import skimage.transform
    import tensorflow as tf
    from tensorlayer.layers import *
    # from scipy.misc import imread, imresize
    # from tensorflow.contrib.slim.python.slim.nets.alexnet import alexnet_v2
    from inception_resnet_v2 import (inception_resnet_v2_arg_scope, inception_resnet_v2)
    from scipy.misc import imread, imresize
    from tensorflow.python.ops import variables
    import tensorlayer as tl
    
    slim = tf.contrib.slim
    try:
    from data.imagenet_classes import *
    except Exception as e:
    raise Exception(
    "{} / download the file from: https://github.com/zsdonghao/tensorlayer/tree/master/example/data".format(e))
    
    n_epoch = 200
    learning_rate = 0.0001
    print_freq = 2
    batch_size = 32
    ## InceptionV3 / All TF-Slim nets can be merged into TensorLayer
    x = tf.placeholder(tf.float32, shape=[None, 299, 299, 3])
    # 输出
    y_ = tf.placeholder(tf.int32, shape=[None, ], name='y_')
    net_in = tl.layers.InputLayer(x, name='input_layer')
    with slim.arg_scope(inception_resnet_v2_arg_scope()):
    network = tl.layers.SlimNetsLayer(
    prev_layer=net_in,
    slim_layer=inception_resnet_v2,
    slim_args={
    'num_classes': 1001,
    'is_training': True,
    },
    name='InceptionResnetV2' # <-- the name should be the same with the ckpt model
    )
    # network = fc_layers(net_cnn)
    sess = tf.InteractiveSession()
    network.print_params(False)
    # network.print_layers()
    saver = tf.train.Saver()
    
    # 加载预训练的参数
    # tl.files.assign_params(sess, npz, network)
    
    tl.layers.initialize_global_variables(sess)
    
    saver.restore(sess, "inception_resnet_v2.ckpt")
    print("Model Restored")
    all_params = sess.run(network.all_params)
    np.savez('inception_resnet_v2.npz', params=all_params)
    sess.close()

      执行成功之后,我们得到模型所有的908个参数。
    2.部分恢复npz参数然后训练模型:
      首先我们修改模型最后一层参数,由于进行的是2分类学习,所以做如下修改:

    with slim.arg_scope(inception_resnet_v2_arg_scope()):
    network = tl.layers.SlimNetsLayer(
    prev_layer=net_in,
    slim_layer=inception_resnet_v2,
    slim_args={
    'num_classes': 2,
    'is_training': True,
    },
    name='InceptionResnetV2' # <-- the name should be the same with the ckpt model
    )

      num_classes改为2,is_training为True。
      接着定义输入输出以及损失函数:

    sess = tf.InteractiveSession()
    # saver = tf.train.Saver()
    y = network.outputs
    y_op = tf.argmax(tf.nn.softmax(y), 1)
    cost = tl.cost.cross_entropy(y, y_, name='cost')
    correct_prediction = tf.equal(tf.cast(tf.argmax(y, 1), tf.float32), tf.cast(y_, tf.float32))
    acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


      下面是定义训练参数,我们只训练最后一层的参数,打印参数出来我们看到:

    [TL] param 900: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights:0 (5, 5, 128, 768) float32_ref
    [TL] param 901: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0 (768,) float32_ref
    [TL] param 902: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean:0 (768,) float32_ref
    [TL] param 903: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance:0 (768,) float32_ref
    [TL] param 904: InceptionResnetV2/AuxLogits/Logits/weights:0 (768, 2) float32_ref
    [TL] param 905: InceptionResnetV2/AuxLogits/Logits/biases:0 (2,) float32_ref
    [TL] param 906: InceptionResnetV2/Logits/Logits/weights:0 (1536, 2) float32_ref
    [TL] param 907: InceptionResnetV2/Logits/Logits/biases:0 (2,) float32_ref
    [TL] num of params: 56940900


      从param 904开始训练就行了,参数恢复到param 903
      下面是训练函数以及恢复部分参数,加载样本数据:

    # 定义 optimizer
    train_params = network.all_params[904:]
    print('训练参数:', train_params)
    # # 加载预训练的参数
    # tl.files.assign_params(sess, params, network)
    train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost, var_list=train_params)
    img, label = read_and_decode("D:\001-Python\train299.tfrecords")
    # 使用shuffle_batch可以随机打乱输入
    X_train, y_train = tf.train.shuffle_batch([img, label],
    batch_size=batch_size, capacity=200,
    min_after_dequeue=100)
    tl.layers.initialize_global_variables(sess)
    params = tl.files.load_npz('', 'inception_resnet_v2.npz')
    params = params[0:904]
    print('当前参数大小:', len(params))
    tl.files.assign_params(sess, params=params, network=network)


      下面依旧是训练模型的代码,跟上一篇一样:

    # # 训练模型
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    step = 0
    filelist = getfilelist()
    for epoch in range(n_epoch):
    start_time = time.time()
    val, l = sess.run([X_train, y_train])#next_data(filelist, batch_size) #
    for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True):
    sess.run(train_op, feed_dict={x: X_train_a, y_: y_train_a})
    if epoch + 1 == 1 or (epoch + 1) % print_freq == 0:
    print("Epoch %d of %d took %fs" % (epoch + 1, n_epoch, time.time() - start_time))
    train_loss, train_acc, n_batch = 0, 0, 0
    for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True):
    err, ac = sess.run([cost, acc], feed_dict={x: X_train_a, y_: y_train_a})
    train_loss += err
    train_acc += ac
    n_batch += 1
    print(" train loss: %f" % (train_loss / n_batch))
    print(" train acc: %f" % (train_acc / n_batch))
    # tl.files.save_npz(network.all_params, name='model_vgg_16_2.npz', sess=sess)
    coord.request_stop()
    coord.join(threads)


      batchsize为20训练200代,部分结果如下:

    Epoch 156 of 200 took 12.568609s
    train loss: 0.382517
    train acc: 0.950000
    Epoch 158 of 200 took 12.457161s
    train loss: 0.382509
    train acc: 0.850000
    Epoch 160 of 200 took 12.385407s
    train loss: 0.320393
    train acc: 1.000000
    Epoch 162 of 200 took 12.489218s
    train loss: 0.480686
    train acc: 0.700000
    Epoch 164 of 200 took 12.388841s
    train loss: 0.329189
    train acc: 0.850000
    Epoch 166 of 200 took 12.446472s
    train loss: 0.379127
    train acc: 0.900000
    Epoch 168 of 200 took 12.888571s
    train loss: 0.365938
    train acc: 0.900000
    Epoch 170 of 200 took 12.850605s
    train loss: 0.353434
    train acc: 0.850000
    Epoch 172 of 200 took 12.855129s
    train loss: 0.315443
    train acc: 0.950000
    Epoch 174 of 200 took 12.906666s
    train loss: 0.460817
    train acc: 0.750000
    Epoch 176 of 200 took 12.830738s
    train loss: 0.421025
    train acc: 0.900000
    Epoch 178 of 200 took 12.852572s
    train loss: 0.418784
    train acc: 0.800000
    Epoch 180 of 200 took 12.951322s
    train loss: 0.316057
    train acc: 0.950000
    Epoch 182 of 200 took 12.866213s
    train loss: 0.363328
    train acc: 0.900000
    Epoch 184 of 200 took 13.012520s
    train loss: 0.379462
    train acc: 0.850000
    Epoch 186 of 200 took 12.934583s
    train loss: 0.472857
    train acc: 0.750000
    Epoch 188 of 200 took 13.038168s
    train loss: 0.236005
    train acc: 1.000000
    Epoch 190 of 200 took 13.056378s
    train loss: 0.266042
    train acc: 0.950000
    Epoch 192 of 200 took 13.016137s
    train loss: 0.255430
    train acc: 0.950000
    Epoch 194 of 200 took 13.013147s
    train loss: 0.422342
    train acc: 0.900000
    Epoch 196 of 200 took 12.980659s
    train loss: 0.353984
    train acc: 0.900000
    Epoch 198 of 200 took 13.033676s
    train loss: 0.320018
    train acc: 0.950000
    Epoch 200 of 200 took 12.945982s
    train loss: 0.288049
    train acc: 0.950000


      好了,迁移学习Inception-ResNet-v2结束。
      作者说SlimNetsLayer是能导入任何Slim Model的。笔者已经验证过导入Inception-ResNet-v2和VGG16成功,Inception V3导入后训练了两三天,正确率一直在10到70之间波动(跟笔者的心情一样不稳定),笔者一直找不出原因,心累,希望哪位朋友再去验证一下Inception V3咯。

  • 相关阅读:
    Python 学习笔记 11.模块(Module)
    Python 学习笔记 8.引用(Reference)
    Python 学习笔记 9.函数(Function)
    Python 学习笔记 6.List和Tuple
    Python 学习笔记 4.if 表达式
    Python 学习笔记 2.自省
    Python 学习笔记 3.简单类型
    Python 学习笔记 7.Dictionary
    Python 学习笔记 5.对象驻留
    Python 学习笔记 10.类(Class)
  • 原文地址:https://www.cnblogs.com/zengfanlin/p/8970868.html
Copyright © 2011-2022 走看看