zoukankan      html  css  js  c++  java
  • MindSpore 计算框架 模型参数 和 优化器 参数的重新载入

    本文主要内容源于:

    https://www.mindspore.cn/tutorial/training/zh-CN/master/use/load_model_for_inference_and_transfer.html#id1

    ======================================================================

    本地加载模型

    用于推理验证

    针对仅推理场景可以使用load_checkpoint把参数直接加载到网络中,以便进行后续的推理验证。

    示例代码如下:

    resnet = ResNet50()
    load_checkpoint("resnet50-2_32.ckpt", net=resnet)
    dateset_eval = create_dataset(os.path.join(mnist_path, "test"), 32, 1) # define the test dataset
    loss = CrossEntropyLoss()
    model = Model(resnet, loss, metrics={"accuracy"})
    acc = model.eval(dataset_eval)
    • load_checkpoint方法会把参数文件中的网络参数加载到模型中。加载后,网络中的参数就是CheckPoint保存的。

    • eval方法会验证训练后模型的精度。

    用于迁移学习

    针对任务中断再训练及微调(Fine Tune)场景,可以加载网络参数和优化器参数到模型中。

    示例代码如下:

    # return a parameter dict for model
    param_dict = load_checkpoint("resnet50-2_32.ckpt")
    resnet = ResNet50()
    opt = Momentum(resnet.trainable_params(), 0.01, 0.9)
    # load the parameter into net
    load_param_into_net(resnet, param_dict)
    # load the parameter into optimizer
    load_param_into_net(opt, param_dict)
    loss = SoftmaxCrossEntropyWithLogits()
    model = Model(resnet, loss, opt)
    model.train(epoch, dataset)
    • load_checkpoint方法会返回一个参数字典。

    • load_param_into_net会把参数字典中相应的参数加载到网络或优化器中。

    ================================================================

    由上面内容可以知道,以下两个函数:

    load_checkpoint

    load_param_into_net

    可以把保存为ckpt文件中的参数重新加载到网络和优化器中。

    给出demo,  数据文件下载参考前文:

    模型参数 和 优化器参数的保存:

    #!/usr/bin python
    # encoding:UTF-8
    
    """" 对输入的超参数进行处理 """
    import os
    import argparse
    
    """ 设置运行的背景context """
    from mindspore import context
    
    """ 对数据集进行预处理 """
    import mindspore.dataset as ds
    import mindspore.dataset.transforms.c_transforms as C
    import mindspore.dataset.vision.c_transforms as CV
    from mindspore.dataset.vision import Inter
    from mindspore import dtype as mstype
    
    """ 构建神经网络 """
    import mindspore.nn as nn
    from mindspore.common.initializer import Normal
    
    """ 训练时对模型参数的保存 """
    from mindspore.train.callback import ModelCheckpoint, CheckpointConfig
    
    """ 导入模型训练需要的库 """
    from mindspore.nn import Accuracy
    from mindspore.train.callback import LossMonitor
    from mindspore import Model
    
    import os
    os.system('rm -f *.ckpt  *.meta')
    
    parser = argparse.ArgumentParser(description='MindSpore LeNet Example')
    parser.add_argument('--device_target', type=str, default="CPU", choices=['Ascend', 'GPU', 'CPU'])
    
    args = parser.parse_known_args()[0]
    
    # 为mindspore设置运行背景context
    context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
    
    
    def create_dataset(data_path, batch_size=32, repeat_size=1,
                       num_parallel_workers=1):
        # 定义数据集
        mnist_ds = ds.MnistDataset(data_path)
        resize_height, resize_width = 32, 32
        rescale = 1.0 / 255.0
        shift = 0.0
        rescale_nml = 1 / 0.3081
        shift_nml = -1 * 0.1307 / 0.3081
    
        # 定义所需要操作的map映射
        resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)
        rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)
        rescale_op = CV.Rescale(rescale, shift)
        hwc2chw_op = CV.HWC2CHW()
        type_cast_op = C.TypeCast(mstype.int32)
    
        # 使用map映射函数,将数据操作应用到数据集
        mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=resize_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=rescale_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=rescale_nml_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=hwc2chw_op, input_columns="image", num_parallel_workers=num_parallel_workers)
    
        # 进行shuffle、batch、repeat操作
        buffer_size = 10000
        mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)
        mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
        mnist_ds = mnist_ds.repeat(repeat_size)
    
        return mnist_ds
    
    
    class LeNet5(nn.Cell):
        """
        Lenet网络结构
        """
    
        def __init__(self, num_class=10, num_channel=1):
            super(LeNet5, self).__init__()
            # 定义所需要的运算
            self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
            self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
            self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))
            self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))
            self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))
            self.relu = nn.ReLU()
            self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
            self.flatten = nn.Flatten()
    
        def construct(self, x):
            # 使用定义好的运算构建前向网络
            x = self.conv1(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.conv2(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.flatten(x)
            x = self.fc1(x)
            x = self.relu(x)
            x = self.fc2(x)
            x = self.relu(x)
            x = self.fc3(x)
            return x
    
    # 实例化网络
    net = LeNet5()
    
    # 定义损失函数
    net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
    
    # 定义优化器
    net_opt = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.9)
    
    # 设置模型保存参数
    # 每125steps保存一次模型参数,最多保留15个文件
    config_ck = CheckpointConfig(save_checkpoint_steps=125, keep_checkpoint_max=15)
    # 应用模型保存参数
    ckpoint = ModelCheckpoint(prefix="checkpoint_lenet", config=config_ck)
    
    
    def train_net(args, model, epoch_size, data_path, repeat_size, ckpoint_cb, sink_mode):
        """定义训练的方法"""
        # 加载训练数据集
        ds_train = create_dataset(os.path.join(data_path, "train"), 32, repeat_size)
        model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor(125)], dataset_sink_mode=sink_mode)
    
    
    def test_net(network, model, data_path):
        """定义验证的方法"""
        ds_eval = create_dataset(os.path.join(data_path, "test"))
        acc = model.eval(ds_eval, dataset_sink_mode=False)
        print("{}".format(acc))
    
    
    mnist_path = "./datasets/MNIST_Data"
    train_epoch = 1
    dataset_size = 1
    model = Model(net, net_loss, net_opt, metrics={"Accuracy": Accuracy()})
    
    
    train_net(args, model, train_epoch, mnist_path, dataset_size, ckpoint, False)
    test_net(net, model, mnist_path)

    生成的参数文件:

    其中, ckpt 类型的文件保存的是 网络参数 和 优化器参数, 而 .meta 文件保存的是计算图的编译后的文件,不过  meta 文件具体怎么用这里还是不了解的,具体深入关注可以参考帖子:

    https://bbs.huaweicloud.com/forum/forum.php?mod=viewthread&tid=138966&page=1#pid1240965

    网络优化器初始化后(不载入备份的网络参数 和 优化器参数情况下),  打印优化器的最后一个参数, moments.fc3.bias  :

    import os
    import numpy as np
    
    """ 构建神经网络 """
    import mindspore.nn as nn
    from mindspore.common.initializer import Normal
    from mindspore import Tensor
    
    # 导入模型参数
    from mindspore.train.serialization import load_checkpoint, load_param_into_net
    
    """ 对数据集进行预处理 """
    import mindspore.dataset as ds
    import mindspore.dataset.transforms.c_transforms as C
    import mindspore.dataset.vision.c_transforms as CV
    from mindspore.dataset.vision import Inter
    from mindspore import dtype as mstype
    
    """ 导入模型训练需要的库 """
    from mindspore.nn import Accuracy
    from mindspore import Model
    from mindspore import context
    
    context.set_context(mode=context.PYNATIVE_MODE, device_target='GPU')
    
    
    def create_dataset(data_path, batch_size=32, repeat_size=1,
                       num_parallel_workers=1):
        # 定义数据集
        mnist_ds = ds.MnistDataset(data_path)
        resize_height, resize_width = 32, 32
        rescale = 1.0 / 255.0
        shift = 0.0
        rescale_nml = 1 / 0.3081
        shift_nml = -1 * 0.1307 / 0.3081
    
        # 定义所需要操作的map映射
        resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)
        rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)
        rescale_op = CV.Rescale(rescale, shift)
        hwc2chw_op = CV.HWC2CHW()
        type_cast_op = C.TypeCast(mstype.int32)
    
        # 使用map映射函数,将数据操作应用到数据集
        mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=resize_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=rescale_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=rescale_nml_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=hwc2chw_op, input_columns="image", num_parallel_workers=num_parallel_workers)
    
        # 进行shuffle、batch、repeat操作
        buffer_size = 10000
        mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)
        mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
        mnist_ds = mnist_ds.repeat(repeat_size)
    
        return mnist_ds
    
    
    class LeNet5(nn.Cell):
        """
        Lenet网络结构
        """
        def __init__(self, num_class=10, num_channel=1):
            super(LeNet5, self).__init__()
            # 定义所需要的运算
            self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
            self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
            self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))
            self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))
            self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))
            self.relu = nn.ReLU()
            self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
            self.flatten = nn.Flatten()
    
        def construct(self, x):
            # 使用定义好的运算构建前向网络
            x = self.conv1(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.conv2(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.flatten(x)
            x = self.fc1(x)
            x = self.relu(x)
            x = self.fc2(x)
            x = self.relu(x)
            x = self.fc3(x)
            return x
    
    
    # 实例化网络
    net = LeNet5()
    # 定义损失函数
    net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
    # 定义优化器
    net_opt = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.9)
    # 构建模型
    model = Model(net, net_loss, net_opt, metrics={"Accuracy": Accuracy()})
    
    
    # 加载已经保存的用于测试的模型
    param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt")
    # 加载参数到网络中
    load_param_into_net(net, param_dict)
    # 加载参数到优化器中
    #load_param_into_net(net_opt, param_dict)
    
    
    _batch_size = 8
    # 定义测试数据集,batch_size设置为1,则取出一张图片
    mnist_path = "./datasets/MNIST_Data"
    ds_test = create_dataset(os.path.join(mnist_path, "test"), batch_size=_batch_size)
    print(model.eval(ds_test))
    
    print(type(net.parameters_and_names()))
    for i, j in net_opt.parameters_and_names():
        print(i)
        if i == "moments.fc3.bias":
            print(Tensor(j))

    运行结果:

    WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
    [WARNING] ME(13133:139644169384064,MainProcess):2021-07-12-03:29:50.183.802 [mindspore/ops/operations/array_ops.py:2302] WARN_DEPRECATED: The usage of Pack is deprecated. Please use Stack.
    {'Accuracy': 0.9594}
    <class 'generator'>
    learning_rate
    conv1.weight
    conv2.weight
    fc1.weight
    fc1.bias
    fc2.weight
    fc2.bias
    fc3.weight
    fc3.bias
    momentum
    moments.conv1.weight
    moments.conv2.weight
    moments.fc1.weight
    moments.fc1.bias
    moments.fc2.weight
    moments.fc2.bias
    moments.fc3.weight
    moments.fc3.bias
    [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

    可以看到,优化器的最后一个参数为全0, 那么载入备份的参数后呢:

    import os
    import numpy as np
    
    """ 构建神经网络 """
    import mindspore.nn as nn
    from mindspore.common.initializer import Normal
    from mindspore import Tensor
    
    # 导入模型参数
    from mindspore.train.serialization import load_checkpoint, load_param_into_net
    
    """ 对数据集进行预处理 """
    import mindspore.dataset as ds
    import mindspore.dataset.transforms.c_transforms as C
    import mindspore.dataset.vision.c_transforms as CV
    from mindspore.dataset.vision import Inter
    from mindspore import dtype as mstype
    
    """ 导入模型训练需要的库 """
    from mindspore.nn import Accuracy
    from mindspore import Model
    from mindspore import context
    
    context.set_context(mode=context.PYNATIVE_MODE, device_target='GPU')
    
    
    def create_dataset(data_path, batch_size=32, repeat_size=1,
                       num_parallel_workers=1):
        # 定义数据集
        mnist_ds = ds.MnistDataset(data_path)
        resize_height, resize_width = 32, 32
        rescale = 1.0 / 255.0
        shift = 0.0
        rescale_nml = 1 / 0.3081
        shift_nml = -1 * 0.1307 / 0.3081
    
        # 定义所需要操作的map映射
        resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)
        rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)
        rescale_op = CV.Rescale(rescale, shift)
        hwc2chw_op = CV.HWC2CHW()
        type_cast_op = C.TypeCast(mstype.int32)
    
        # 使用map映射函数,将数据操作应用到数据集
        mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns="label", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=resize_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=rescale_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=rescale_nml_op, input_columns="image", num_parallel_workers=num_parallel_workers)
        mnist_ds = mnist_ds.map(operations=hwc2chw_op, input_columns="image", num_parallel_workers=num_parallel_workers)
    
        # 进行shuffle、batch、repeat操作
        buffer_size = 10000
        mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)
        mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)
        mnist_ds = mnist_ds.repeat(repeat_size)
    
        return mnist_ds
    
    
    class LeNet5(nn.Cell):
        """
        Lenet网络结构
        """
        def __init__(self, num_class=10, num_channel=1):
            super(LeNet5, self).__init__()
            # 定义所需要的运算
            self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')
            self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')
            self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))
            self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))
            self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))
            self.relu = nn.ReLU()
            self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)
            self.flatten = nn.Flatten()
    
        def construct(self, x):
            # 使用定义好的运算构建前向网络
            x = self.conv1(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.conv2(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.flatten(x)
            x = self.fc1(x)
            x = self.relu(x)
            x = self.fc2(x)
            x = self.relu(x)
            x = self.fc3(x)
            return x
    
    
    # 实例化网络
    net = LeNet5()
    # 定义损失函数
    net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
    # 定义优化器
    net_opt = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.9)
    # 构建模型
    model = Model(net, net_loss, net_opt, metrics={"Accuracy": Accuracy()})
    
    
    # 加载已经保存的用于测试的模型
    param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt")
    # 加载参数到网络中
    load_param_into_net(net, param_dict)
    # 加载参数到优化器中
    load_param_into_net(net_opt, param_dict)
    
    
    _batch_size = 8
    # 定义测试数据集,batch_size设置为1,则取出一张图片
    mnist_path = "./datasets/MNIST_Data"
    ds_test = create_dataset(os.path.join(mnist_path, "test"), batch_size=_batch_size)
    print(model.eval(ds_test))
    
    print(type(net.parameters_and_names()))
    for i, j in net_opt.parameters_and_names():
        print(i)
        if i == "moments.fc3.bias":
            print(Tensor(j))

    运行结果:

    WARNING: 'ControlDepend' is deprecated from version 1.1 and will be removed in a future version, use 'Depend' instead.
    [WARNING] ME(13292:140444628824192,MainProcess):2021-07-12-03:31:50.471.228 [mindspore/ops/operations/array_ops.py:2302] WARN_DEPRECATED: The usage of Pack is deprecated. Please use Stack.
    {'Accuracy': 0.9594}
    <class 'generator'>
    learning_rate
    conv1.weight
    conv2.weight
    fc1.weight
    fc1.bias
    fc2.weight
    fc2.bias
    fc3.weight
    fc3.bias
    momentum
    moments.conv1.weight
    moments.conv2.weight
    moments.fc1.weight
    moments.fc1.bias
    moments.fc2.weight
    moments.fc2.bias
    moments.fc3.weight
    moments.fc3.bias
    [-0.00917954  0.00276246 -0.01406308  0.01492264 -0.01100682 -0.0692124
      0.02251344  0.00341095  0.03600671  0.02384563]

    可以看到,载入备份的优化器参数后,打印结果与之前不同了。

    本博客是博主个人学习时的一些记录,不保证是为原创,个别文章加入了转载的源地址还有个别文章是汇总网上多份资料所成,在这之中也必有疏漏未加标注者,如有侵权请与博主联系。
  • 相关阅读:
    Mechanism of Loading Resources
    Dashboards (Android)
    Tips: compilation and creating new projects on Android 4.0
    设备方向
    【转】字符串分割(C++)
    Moving From Objective-C to C++
    Simulate android behaviors on win32
    【ybtoj高效进阶 21265】排队问题(fhq-Treap)(构造)
    【ybtoj高效进阶 21262】头文字 D(线段树)(数学)
    【ybtoj高效进阶 21261】头文字 C(单调队列优化DP)
  • 原文地址:https://www.cnblogs.com/devilmaycry812839668/p/15001124.html
Copyright © 2011-2022 走看看