zoukankan      html  css  js  c++  java
  • 非图片格式如何转成lmdb格式--caffe

    链接

    LMDB is the database of choice when using Caffe with large datasets. This is a tutorial of how to create an LMDB database from Python. First, let’s look at the pros and cons of using LMDB over HDF5.

    Reasons to use HDF5:

    • Simple format to read/write.

    Reasons to use LMDB:

    • LMDB uses memory-mapped files, giving much better I/O performance.
    • Works well with really large datasets. The HDF5 files are always read entirely into memory, so you can’t have any HDF5 file exceed your memory capacity. You can easily split your data into several HDF5 files though (just put several paths to h5 files in your text file). Then again, compared to LMDB’s page caching the I/O performance won’t be nearly as good.

    LMDB from Python

    You will need the Python package lmdb as well as Caffe’s python package (make pycaffe in Caffe). LMDB provides key-value storage, where each <key, value> pair will be a sample in our dataset. The key will simply be a string version of an ID value, and the value will be a serialized version of the Datum class in Caffe (which are built using protobuf).

    import numpy as np
    import lmdb
    import caffe
    
    N = 1000
    
    # Let's pretend this is interesting data
    X = np.zeros((N, 3, 32, 32), dtype=np.uint8)
    y = np.zeros(N, dtype=np.int64)
    
    # We need to prepare the database for the size. We'll set it 10 times
    # greater than what we theoretically need. There is little drawback to
    # setting this too big. If you still run into problem after raising
    # this, you might want to try saving fewer entries in a single
    # transaction.
    map_size = X.nbytes * 10
    
    env = lmdb.open('mylmdb', map_size=map_size)
    
    with env.begin(write=True) as txn:
        # txn is a Transaction object
        for i in range(N):
            datum = caffe.proto.caffe_pb2.Datum()
            datum.channels = X.shape[1]
            datum.height = X.shape[2]
            datum.width = X.shape[3]
            datum.data = X[i].tobytes()  # or .tostring() if numpy < 1.9
            datum.label = int(y[i])
            str_id = '{:08}'.format(i)
    
            # The encode is only essential in Python 3
            txn.put(str_id.encode('ascii'), datum.SerializeToString())
    

    You can also open up and inspect an existing LMDB database from Python:

    import numpy as np
    import lmdb
    import caffe
    
    env = lmdb.open('mylmdb', readonly=True)
    with env.begin() as txn:
        raw_datum = txn.get(b'00000000')
    
    datum = caffe.proto.caffe_pb2.Datum()
    datum.ParseFromString(raw_datum)
    
    flat_x = np.fromstring(datum.data, dtype=np.uint8)
    x = flat_x.reshape(datum.channels, datum.height, datum.width)
    y = datum.label
    

    Iterating <key, value> pairs is also easy:

    with env.begin() as txn:
        cursor = txn.cursor()
        for key, value in cursor:
            print(key, value)
    
  • 相关阅读:
    vbScript,DateDiff 关于DateDiff()函数
    phpcms 3.0.0文件上传漏洞
    测试Web应用程序是否存在跨站点脚本漏洞
    phpcms 3.0.0文件上传漏洞
    谈Windows和Linux和平性之争
    用一个实例讲解DB2数据库游标轮回的成绩
    Oracle新技能对Linux意味着什么?
    启迪版:Elive 1.7.5 (不颠簸版本)颁发揭晓
    刊行版:SystemRescueCd 1.0.3公布
    把linux插手到域
  • 原文地址:https://www.cnblogs.com/guohaoyu110/p/7448795.html
Copyright © 2011-2022 走看看