zoukankan      html  css  js  c++  java
  • pickel加速caffe读图

    64*64*3小图(12KB),batchSize=128,训练样本100万,

    全部load进来内存受不了,load一次需要大半天

    训练时读入一个batch,ali云服务器上每个batch读入时间1.9~3.2s不等,迭代一次2s多

    由于有多个label不能用caffe自带的lmdb转了,输入是自己写的python层,试着用pickel

    import os, sys
    import cv2
    import numpy as np
    import numpy.random as npr
    import cPickle as pickle
    wk_dir = "/Users/xxx/wkspace/caffe_space/detection/caffe/data/1103reg64/"
    InputSize = int(sys.argv[1])
    BatchSize = int(sys.argv[2])
    trainfile = "train.txt"
    testfile = "test.txt"
    print "gen imdb with for net input:", InputSize, "batchSize:", BatchSize
    
    with open(wk_dir+trainfile, 'r') as f:
        trainlines = f.readlines()
    with open(wk_dir+testfile, 'r') as f:
        testlines = f.readlines()
    #######################################
    # we seperate train data by batchsize #
    #######################################
    to_dir = wk_dir + "/trainIMDB/"
    if not os.path.isdir(to_dir):
        os.makedirs(to_dir)
    
    train_list = []
    cur_ = 0
    sum_ = len(trainlines)
    for line in trainlines:
        cur_ += 1
        words = line.split()
        image_file_name = words[0]
        im = cv2.imread(wk_dir + image_file_name)
        h,w,ch = im.shape
        if h!=InputSize or w!=InputSize:
            im = cv2.resize(im,(InputSize,InputSize))
        roi = [float(words[2]),float(words[3]),float(words[4]),float(words[5])]
        train_list.append([im, roi])
        if (cur_ % BatchSize == 0):
            print "write batch:" , cur_/BatchSize
            fid = open(to_dir +'train'+ str(BatchSize) + '_'+str(cur_/BatchSize),'w')
            pickle.dump(train_list, fid)
            fid.close()
            train_list[:] = []
    
    print len(train_list), "train data generated
    "
    
    ###########################
    # tests #
    ###########################
    to_dir = wk_dir + "/testIMDB/"
    if not os.path.isdir(to_dir):
        os.makedirs(to_dir)
    test_list = []
    cur_ = 0
    sum_ = len(testlines)
    for line in testlines:
       cur_ += 1
       words = line.split()
       image_file_name = words[0]
       im = cv2.imread(wk_dir + image_file_name)
       h,w,ch = im.shape
       if h!=InputSize or w!=InputSize:
           im = cv2.resize(im,(InputSize,InputSize))
       roi = [float(words[2]),float(words[3]),float(words[4]),float(words[5])]
       test_list.append([im, roi])
    
       if (cur_ % BatchSize == 0):
           print "write batch:", cur_ / BatchSize
           fid = open(to_dir +'test'+ str(BatchSize) + '_'+str(cur_/BatchSize), 'w')
           pickle.dump(test_list, fid)
           fid.close()
           test_list[:] = []
    print len(test_list), "test data generated
    "

    每个batch生成4.8MB的块(约比128张原图占3倍磁盘空间):

    训练时读入,ali云训练每个batch时间变为0.2s,可加速10倍

    mac上是ssd硬盘,本来读图就很快,一个batch 0.05s, 改成pickel后反而变慢了,load一个batch需要0.2s。

  • 相关阅读:
    Element-UI饿了么时间组件控件按月份周日期,开始时间结束时间范围限制参数
    揭秘(爱奇艺、优酷、腾讯)在线视频网站视频2倍速、多倍速快速播放的前端实现方法
    提高敲代码的效率:程序员同扭曲时间的事儿抗争
    原生JS在网页上复制的所有文字后面自动加上一段版权声明
    .net core kafka 入门实例 一篇看懂
    聊聊Grpc使用中的坑以及怎么填
    MongoDB 上手开发实践(入门上手就这一篇)
    聊聊redis实际运用及骚操作
    .NET Core 微服务之Polly熔断策略
    .NET Core 微服务之Polly重试策略
  • 原文地址:https://www.cnblogs.com/zhengmeisong/p/9903539.html
Copyright © 2011-2022 走看看