zoukankan      html  css  js  c++  java
  • Tensorflow读写TFRecords文件

    在使用slim之类的tensorflow自带框架的时候一般默认的数据格式就是TFRecords,在训练的时候使用TFRecords中数据的流程如下:使用input pipeline读取tfrecords文件/其他支持的格式,然后随机乱序,生成文件序列,读取并解码数据,输入模型训练。

    如果有一串jpg图片地址和相应的标签:imageslabels

    1. 生成TFrecords

    存入TFRecords文件需要数据先存入名为example的protocol buffer,然后将其serialize成为string才能写入。example中包含features,用于描述数据类型:bytes,float,int64。

    import tensorflow as tf
    import cv2
    
    def _int64_feature(value):
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
    
    def _bytes_feature(value):
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
    
    train_filename = 'train.tfrecords'
    with tf.python_io.TFRecordWriter(train_filename) as tfrecord_writer:  
        for i in range(len(images)):
            # read in image data by tf
            img_data = tf.gfile.FastGFile(images[i], 'rb').read()  # image data type is string
            label = labels[i]
            # get width and height of image
            image_shape = cv2.imread(images[i]).shape
            width = image_shape[1]
            height = image_shape[0]
            # create features
            feature = {'train/image': _bytes_feature(img_data),
                               'train/label': _int64_feature(label),  # label: integer from 0-N
                               'train/height': _int64_feature(height), 
                               'train/width': _int64_feature(width)}
            # create example protocol buffer
            example = tf.train.Example(features=tf.train.Features(feature=feature))
            # serialize protocol buffer to string
            tfrecord_writer.write(example.SerializeToString())
     tfrecord_writer.close()
    

    2. 读取TFRecords文件

    首先用tf.train.string_input_producer读取tfrecords文件的list建立FIFO序列,可以申明num_epoches和shuffle参数表示需要读取数据的次数以及时候将tfrecords文件读入顺序打乱,然后定义TFRecordReader读取上面的序列返回下一个record,用tf.parse_single_example对读取到TFRecords文件进行解码,根据保存的serialize example和feature字典返回feature所对应的值。此时获得的值都是string,需要进一步解码为所需的数据类型。把图像数据的string reshape成原始图像后可以进行preprocessing操作。此外,还可以通过tf.train.batch或者tf.train.shuffle_batch将图像生成batch序列。

    由于tf.train函数会在graph中增加tf.train.QueueRunner类,而这些类有一系列的enqueue选项使一个队列在一个线程里运行。为了填充队列就需要用tf.train.start_queue_runners来为所有graph中的queue runner启动线程,而为了管理这些线程就需要一个tf.train.Coordinator来在合适的时候终止这些线程。

    import tensorflow as tf
    import matplotlib.pyplot as plt
    
    data_path = 'train.tfrecords'
    
    with tf.Session() as sess:
        # feature key and its data type for data restored in tfrecords file
        feature = {'train/image': tf.FixedLenFeature([], tf.string),
                         'train/label': tf.FixedLenFeature([], tf.int64),
                         'train/height': tf.FixedLenFeature([], tf.int64),
                         'train/width': tf.FixedLenFeature([], tf.int64)}
        # define a queue base on input filenames
        filename_queue = tf.train.string_input_producer([data_path], num_epoches=1)
        # define a tfrecords file reader
        reader = tf.TFRecordReader()
        # read in serialized example data
        _, serialized_example = reader.read(filename_queue)
        # decode example by feature
        features = tf.parse_single_example(serialized_example, features=feature)
        image = tf.image.decode_jpeg(features['train/image'])
        image = tf.image.convert_image_dtype(image, dtype=tf.float32)  # convert dtype from unit8 to float32 for later resize
        label = tf.cast(features['train/label'], tf.int64)
        height = tf.cast(features['train/height'], tf.int32)
        width = tf.cast(features['train/width'], tf.int32)
        # restore image to [height, width, 3]
        image = tf.reshape(image, [height, width, 3])
        # resize
        image = tf.image.resize_images(image, [224, 224])
        # create bathch
        images, labels = tf.train.shuffle_batch([image, label], batch_size=10, capacity=30, num_threads=1, min_after_dequeue=10) # capacity是队列的最大容量,num_threads是dequeue后最小的队列大小,num_threads是进行队列操作的线程数。
    
        # initialize global & local variables
        init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
        sess.run(init_op)
        # create a coordinate and run queue runner objects
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)
        for batch_index in range(3):
            batch_images, batch_labels = sess.run([images, labels])
            for i in range(10):
                plt.imshow(batch_images[i, ...])
                plt.show()
                print "Current image label is: ", batch_lables[i]
        # close threads
        coord.request_stop()
        coord.join(threads)
        sess.close()
    

    参考

    1. https://stackoverflow.com/questions/37151895/tensorflow-read-all-examples-from-a-tfrecords-at-once
    2. http://www.machinelearninguru.com/deep_learning/tensorflow/basics/tfrecord/tfrecord.html
  • 相关阅读:
    我的第一篇博客/markdown
    iOS开发编码建议与编程经验
    iOS 知识点梳理
    Objective-C中类和对象的介绍
    Linux虚拟机部署单机solr报错500解决方法之一
    day02:三元运算、布林非、列表等(20170214)
    day01:判断与循环(20170213)
    前端面试题大全2
    前端面试题大全
    [js] charAt()、charCodeAt()、fromCharCode()
  • 原文地址:https://www.cnblogs.com/arkenstone/p/7507261.html
Copyright © 2011-2022 走看看