zoukankan      html  css  js  c++  java
  • 如何使用keras加载下载好的数据集

    https://blog.csdn.net/houchaoqun_xmu/article/details/78492718

    【keras】解决 example 案例中 MNIST 数据集下载不了的问题

    前言:

     

    keras 源码中下载MNIST的方式是 path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz'),数据源是通过 url = https://s3.amazonaws.com/img-datasets/mnist.npz 进行下载的。访问该 url 地址被墙了,导致 MNIST 相关的案例都卡在数据下载的环节。本文主要提供解决方案,让需要的读者可以跑案例的代码感受一下。

     

    本文的贡献主要包括如下:

     

    1)提供 mnist_npz 数据集;

    2)分析了关于 mnist 几个相关的源代码;

    3)提供了一种能够顺利运行 keras 源码中 example 下 mnist 的相关案例;

    4)找到了另外几种解决方案,提供了相关的链接。

    numpy.load(path)

    numpy.load() 函数起到很重要的作用。它可以读取 .npy .npz 等文件类型,并返回对应的数据类型。

    1)如果文件类型是 .pny 则返回一个1维数组。

    2)如果文件类型是 .npz 则返回一个类似字典的数据类型,包含 {filename: array} 键值对。如,本例中的键值对如下所示:

    1. f = np.load(path)
    2. x_train, y_train = f['x_train'], f['y_train']
    3. x_test, y_test = f['x_test'], f['y_test']
    4. f.close()

    详情请参考:https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html

    原始 .kerasexamplesmnist_mlp.py

    1. # -*- coding: utf-8 -*-
    2. '''Trains a simple deep NN on the MNIST dataset.
    3.  
    4. Gets to 98.40% test accuracy after 20 epochs
    5. (there is *a lot* of margin for parameter tuning).
    6. 2 seconds per epoch on a K520 GPU.
    7. '''
    8.  
    9. from __future__ import print_function
    10.  
    11. import keras
    12. from keras.datasets import mnist
    13. from keras.models import Sequential
    14. from keras.layers import Dense, Dropout
    15. from keras.optimizers import RMSprop
    16.  
    17.  
    18. batch_size = 128
    19. num_classes = 10
    20. epochs = 20
    21.  
    22. # the data, shuffled and split between train and test sets
    23. (x_train, y_train), (x_test, y_test) = mnist.load_data()
    24.  
    25. x_train = x_train.reshape(60000, 784)
    26. x_test = x_test.reshape(10000, 784)
    27. x_train = x_train.astype('float32')
    28. x_test = x_test.astype('float32')
    29. x_train /= 255
    30. x_test /= 255
    31. print(x_train.shape[0], 'train samples')
    32. print(x_test.shape[0], 'test samples')
    33.  
    34. # convert class vectors to binary class matrices
    35. y_train = keras.utils.to_categorical(y_train, num_classes)
    36. y_test = keras.utils.to_categorical(y_test, num_classes)
    37.  
    38. model = Sequential()
    39. model.add(Dense(512, activation='relu', input_shape=(784,)))
    40. model.add(Dropout(0.2))
    41. model.add(Dense(512, activation='relu'))
    42. model.add(Dropout(0.2))
    43. model.add(Dense(10, activation='softmax'))
    44.  
    45. model.summary()
    46.  
    47. ###
    48. # 1)categorical_crossentropy(output, target, from_logits=False):
    49. # 计算输出张量和目标张量的Categorical crossentropy(类别交叉熵),目标张量与输出张量必须shape相同。
    50. # 多分类的对数损失函数,与softmax分类器相对应的。
    51. #
    52. # 2)RMSprop()
    53. # AdaGrad算法的改进。鉴于神经网络都是非凸条件下的,RMSProp在非凸条件下结果更好,改变梯度累积为指数衰减的移动平均以丢弃遥远的过去历史。
    54. # reference:http://blog.csdn.net/bvl10101111/article/details/72616378
    55. #
    56. model.compile(loss='categorical_crossentropy',
    57. optimizer=RMSprop(),
    58. metrics=['accuracy'])
    59.  
    60. history = model.fit(x_train, y_train,
    61. batch_size=batch_size,
    62. epochs=epochs,
    63. verbose=1,
    64. validation_data=(x_test, y_test))
    65. score = model.evaluate(x_test, y_test, verbose=0)
    66. print('Test loss:', score[0])
    67. print('Test accuracy:', score[1])

    .keraskerasdatasetsmnist.py - load_data()

    1. # -*- coding: utf-8 -*-
    2. from ..utils.data_utils import get_file
    3. import numpy as np
    4.  
    5. def load_data(path='mnist.npz'):
    6. """Loads the MNIST dataset.
    7.  
    8. # Arguments
    9. path: path where to cache the dataset locally
    10. (relative to ~/.keras/datasets).
    11.  
    12. # Returns
    13. Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.
    14.  
    15. # numpy.load()
    16. # numpy.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII')
    17. # 1) Load arrays or pickled objects from .npy, .npz or pickled files
    18. # 2)
    19. # reference: https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html
    20.  
    21. """
    22. path = get_file(path, origin='https://s3.amazonaws.com/img-datasets/mnist.npz')
    23. f = np.load(path)
    24. x_train, y_train = f['x_train'], f['y_train']
    25. x_test, y_test = f['x_test'], f['y_test']
    26. f.close()
    27. return (x_train, y_train), (x_test, y_test)

    下载 mnist.npz 数据集

    本文使用的 mnist.npz 数据集是通过一个 japan 的服务器下载得到的,在此免费分享给大家。如果下载有问题的话,可以留言哈。

    下载链接:https://pan.baidu.com/s/1jH6uFFC 密码: dw3d

    改造 mnist_mlp.py

    方法1:

    mnist_mlp.py 源码是使用如下命令获取数据集:

    1. # the data, shuffled and split between train and test sets
    2. (x_train, y_train), (x_test, y_test) = mnist.load_data()

    调用的是 .keraskerasdatasetsmnist.py 脚本中的 def load_data(path='mnist.npz') 函数,也就是因为网址被墙了导致不能正常运行的原因。本文事先下好了 mnist.npz 数据集,然后改动了一些代码使之正常运行。换句话说,本文使用的是“读取本地数据集”的方法,步骤如下:

    1)下载好 mnist_npz 数据集,并将其放于 .kerasexamples 目录下。

    2)改动后的 mnist_mlp.py 代码如下:

    1. # -*- coding: utf-8 -*-
    2. '''Trains a simple deep NN on the MNIST dataset.
    3.  
    4. Gets to 98.40% test accuracy after 20 epochs
    5. (there is *a lot* of margin for parameter tuning).
    6. 2 seconds per epoch on a K520 GPU.
    7. '''
    8.  
    9. from __future__ import print_function
    10.  
    11. import keras
    12. from keras.datasets import mnist
    13. from keras.models import Sequential
    14. from keras.layers import Dense, Dropout
    15. from keras.optimizers import RMSprop
    16.  
    17. batch_size = 128
    18. num_classes = 10
    19. epochs = 20
    20.  
    21. # the data, shuffled and split between train and test sets
    22. # (x_train, y_train), (x_test, y_test) = mnist.load_data()
    23.  
    24. import numpy as np
    25. path='./mnist.npz'
    26. f = np.load(path)
    27. x_train, y_train = f['x_train'], f['y_train']
    28. x_test, y_test = f['x_test'], f['y_test']
    29. f.close()
    30.  
    31. x_train = x_train.reshape(60000, 784).astype('float32')
    32. x_test = x_test.reshape(10000, 784).astype('float32')
    33. x_train /= 255
    34. x_test /= 255
    35. print(x_train.shape[0], 'train samples')
    36. print(x_test.shape[0], 'test samples')
    37.  
    38. # convert class vectors to binary class matrices
    39. # label为0~9共10个类别,keras要求格式为binary class matrices
    40.  
    41. y_train = keras.utils.to_categorical(y_train, num_classes)
    42. y_test = keras.utils.to_categorical(y_test, num_classes)
    43.  
    44. # add by hcq-20171106
    45. # Dense of keras is full-connection.
    46. model = Sequential()
    47. model.add(Dense(512, activation='relu', input_shape=(784,)))
    48. model.add(Dropout(0.2))
    49. model.add(Dense(512, activation='relu'))
    50. model.add(Dropout(0.2))
    51. model.add(Dense(num_classes, activation='softmax'))
    52.  
    53. model.summary()
    54.  
    55. model.compile(loss='categorical_crossentropy',
    56. optimizer=RMSprop(),
    57. metrics=['accuracy'])
    58.  
    59. history = model.fit(x_train, y_train,
    60. batch_size=batch_size,
    61. epochs=epochs,
    62. verbose=1,
    63. validation_data=(x_test, y_test))
    64. score = model.evaluate(x_test, y_test, verbose=0)
    65. print('Test loss:', score[0])
    66. print('Test accuracy:', score[1])
    运行效果如下所示:
    1. 60000 train samples
    2. 10000 test samples
    3. _________________________________________________________________
    4. Layer (type) Output Shape Param #
    5. =================================================================
    6. dense_1 (Dense) (None, 512) 401920
    7. _________________________________________________________________
    8. dropout_1 (Dropout) (None, 512) 0
    9. _________________________________________________________________
    10. dense_2 (Dense) (None, 512) 262656
    11. _________________________________________________________________
    12. dropout_2 (Dropout) (None, 512) 0
    13. _________________________________________________________________
    14. dense_3 (Dense) (None, 10) 5130
    15. =================================================================
    16. Total params: 669,706
    17. Trainable params: 669,706
    18. Non-trainable params: 0
    19. _________________________________________________________________
    20. Train on 60000 samples, validate on 10000 samples
    21. Epoch 1/20
    22. 2017-11-09 23:06:16.881800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
    23.  
    24. ... ...
    25.  
    26. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0387 - acc: 0.9888 - val_loss: 0.0706 - val_acc: 0.9814
    27. Epoch 8/20
    28. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0341 - acc: 0.9899 - val_loss: 0.0789 - val_acc: 0.9827
    29. Epoch 9/20
    30. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0304 - acc: 0.9911 - val_loss: 0.0851 - val_acc: 0.9833
    31. Epoch 10/20
    32. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0290 - acc: 0.9918 - val_loss: 0.0867 - val_acc: 0.9818
    33. Epoch 11/20
    34. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0264 - acc: 0.9924 - val_loss: 0.0881 - val_acc: 0.9833
    35. Epoch 12/20
    36. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0261 - acc: 0.9928 - val_loss: 0.1095 - val_acc: 0.9801
    37. Epoch 13/20
    38. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0246 - acc: 0.9931 - val_loss: 0.1012 - val_acc: 0.9830
    39. Epoch 14/20
    40. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0233 - acc: 0.9935 - val_loss: 0.1116 - val_acc: 0.9812
    41. Epoch 15/20
    42. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0223 - acc: 0.9942 - val_loss: 0.1016 - val_acc: 0.9832
    43. Epoch 16/20
    44. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0214 - acc: 0.9943 - val_loss: 0.1053 - val_acc: 0.9832
    45. Epoch 17/20
    46. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0178 - acc: 0.9950 - val_loss: 0.1095 - val_acc: 0.9838
    47. Epoch 18/20
    48. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0212 - acc: 0.9949 - val_loss: 0.1158 - val_acc: 0.9822
    49. Epoch 19/20
    50. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0197 - acc: 0.9951 - val_loss: 0.1112 - val_acc: 0.9831
    51. Epoch 20/20
    52. 60000/60000 [==============================] - 1s 23us/step - loss: 0.0203 - acc: 0.9951 - val_loss: 0.1097 - val_acc: 0.9833
    53. Test loss: 0.109655842465
    54. Test accuracy: 0.9833

    方法2:参考该【博文

     

     (x_train, y_train), (x_test, y_test) = mnist.load_data(path='/home/duchao/下载/mnist.npz')

    Reference:

     

    keras 中文文档:http://keras-cn.readthedocs.io/en/latest/

    阅读源码遇到的一些TF、keras函数及问题:http://blog.csdn.net/jsliuqun/article/details/64444302

    python读取mnist数据集:https://blog.mythsman.com/2016/01/25/1/

  • 相关阅读:
    删除字符串中的所有相邻的重复项
    前序 中序 后序
    用栈构建数组
    字符串
    链表相交 走完自己的路去走他人的路 总会相交
    环形链表
    selenium的简单登录操作
    【Spring 从0开始】IOC容器的Bean管理
    Ubuntu系统的常用命令:ssh保活、用户管理、开机自启、后台运行
    远程访问安装xfce4的内网服务器
  • 原文地址:https://www.cnblogs.com/shuimuqingyang/p/10399312.html
Copyright © 2011-2022 走看看