zoukankan      html  css  js  c++  java
  • tensorflow 批次读取文件内的数据,并将顺序随机化处理. --[python]

    使用tensorflow批次的读取预处理之后的文本数据,并将其分为一个迭代器批次:

    比如此刻,我有一个处理之后的数据包: data.csv  shape =(8,10),其中这个结构中,前五个列为feature , 后五列为label

    1,2,3,4,5,6,7,8,9,10
    11,12,13,14,15,16,17,18,19,20
    21,22,23,24,25,26,27,28,29,30
    31,32,33,34,35,36,37,38,39,40
    41,42,43,44,45,46,47,48,49,50
    51,52,53,54,55,56,57,58,59,60
    1,1,1,1,1,2,2,2,2,2
    3,3,3,3,3,4,4,4,4,4

    现在我需要将其分为4个批次: 也就是每个批次batch的大小为2

    然后我可能需要将其顺序打乱,所以这里提供了两种方式,顺序和随机

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    __author__ = 'xijun1'
    import tensorflow as tf
    import numpy as np
    
    # data = np.arange(1, 100 + 1)
    # print ",".join( [str(i) for i in data])
    # data_input = tf.constant(data)
    filename_queue = tf.train.string_input_producer(["data.csv"])
    reader = tf.TextLineReader(skip_header_lines=0)
    key, value = reader.read(filename_queue)
    # decode_csv will convert a Tensor from type string (the text line) in
    # a tuple of tensor columns with the specified defaults, which also
    # sets the data type for each column
    words_size = 5  # 每一行数据的长度
    decoded = tf.decode_csv(
        value,
        field_delim=',',
        record_defaults=[[0] for i in range(words_size * 2)])
    
    batch_size = 2 # 每一个批次的大小
    # 随机
    batch_shuffle = tf.train.shuffle_batch(decoded, batch_size=batch_size,
                                           capacity=batch_size * words_size,
                                           min_after_dequeue=batch_size)
    #顺序
    batch_no_shuffle = tf.train.batch(decoded, batch_size=batch_size, capacity=batch_size * words_size,
                                      allow_smaller_final_batch=batch_size)
    shuffle_features = tf.transpose(tf.stack(batch_shuffle[0:words_size]))
    shuffle_label = tf.transpose(tf.stack(batch_shuffle[words_size:]))
    features = tf.transpose(tf.stack(batch_no_shuffle[0:words_size]))
    label = tf.transpose(tf.stack(batch_no_shuffle[words_size:]))
    
    with tf.Session() as sess:
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)
        for i in range(8/batch_size):
            print (i+10, sess.run([shuffle_features, shuffle_label]))
            print (i, sess.run([features, label]))
        coord.request_stop()
        coord.join(threads)

    当我们运行的时候,我们可以得到这个结果:

    (10, [array([[ 1,  2,  3,  4,  5],
           [31, 32, 33, 34, 35]], dtype=int32), array([[ 6,  7,  8,  9, 10],
           [36, 37, 38, 39, 40]], dtype=int32)])
    (0, [array([[11, 12, 13, 14, 15],
           [21, 22, 23, 24, 25]], dtype=int32), array([[16, 17, 18, 19, 20],
           [26, 27, 28, 29, 30]], dtype=int32)])
    (11, [array([[51, 52, 53, 54, 55],
           [ 3,  3,  3,  3,  3]], dtype=int32), array([[56, 57, 58, 59, 60],
           [ 4,  4,  4,  4,  4]], dtype=int32)])
    (1, [array([[41, 42, 43, 44, 45],
           [ 1,  1,  1,  1,  1]], dtype=int32), array([[46, 47, 48, 49, 50],
           [ 2,  2,  2,  2,  2]], dtype=int32)])
    (12, [array([[ 3,  3,  3,  3,  3],
           [11, 12, 13, 14, 15]], dtype=int32), array([[ 4,  4,  4,  4,  4],
           [16, 17, 18, 19, 20]], dtype=int32)])
    (2, [array([[ 1,  2,  3,  4,  5],
           [21, 22, 23, 24, 25]], dtype=int32), array([[ 6,  7,  8,  9, 10],
           [26, 27, 28, 29, 30]], dtype=int32)])
    (13, [array([[31, 32, 33, 34, 35],
           [ 1,  1,  1,  1,  1]], dtype=int32), array([[36, 37, 38, 39, 40],
           [ 2,  2,  2,  2,  2]], dtype=int32)])
    (3, [array([[41, 42, 43, 44, 45],
           [ 1,  1,  1,  1,  1]], dtype=int32), array([[46, 47, 48, 49, 50],
           [ 2,  2,  2,  2,  2]], dtype=int32)])
  • 相关阅读:
    Car HDU
    Defeat the Enemy UVALive
    Alice and Bob HDU
    Gone Fishing POJ
    Radar Installation POJ
    Supermarket POJ
    Moo Volume POJ
    Text Document Analysis CodeForces
    checkbox全选与反选

  • 原文地址:https://www.cnblogs.com/gongxijun/p/10045796.html
Copyright © 2011-2022 走看看