zoukankan      html  css  js  c++  java
  • [笔记] numpy保存文件的耗时记录

    简单统计了一下numpy在保存数据到文件时几种方式的耗时。

    所用的数据有两个,一个是10000x10000的大矩阵,一个是640x480的小矩阵,分别查看在大数据和小数据上保存和加载的表现。

    保存方式有三种:

    • np.save():直接将对象dump为二进制文件,无压缩,文件大
    • np.savez():可同时保存多个对象,加载时通过字典读取,无压缩,文件大
    • np.savez_compressed():将np.savez()的结果进行压缩,文件小

    运行环境:Win10 64bit,Python 3.7

    测试数据显示:

    • 大矩阵的保存:
      np.save():耗时 550.91 ms, 文件大小 390625.12 KB
      np.savez():耗时 970.71 ms, 文件大小 390625.24 KB
      np.savez_compressed():耗时 36123.80 ms, 是np.save()的65.6倍,文件大小 63423.66 KB,压缩率 6.16

    • 大矩阵的加载,都是np.load(),耗时分别为 488.87 ms, 1162.00 ms, 2158.33 ms。加载压缩数据的时间,是无压缩数据时间的4.4倍。

    • 小矩阵的保存:
      np.save():耗时 1.70 ms, 文件大小 1200.12 KB
      np.savez():耗时 7.98 ms, 文件大小 1200.24 KB
      np.savez_compressed():耗时 146.19 ms, 是np.save()的86倍,文件大小 195.20 KB,压缩率 6.15

    • 小矩阵的加载,都是np.load(),耗时分别为 1.80 ms,6.28 ms,12.97 ms。加载压缩数据的时间,是无压缩数据时间的7.2倍。

    由此可见:

    • np.savez()因为有字典操作,所以耗时比np.save()会增加
    • np.savez_compressed()有压缩操作,所以耗时比np.save()大60-90倍,对于随机数据,压缩率在6左右,如果是稀疏矩阵,压缩耗时及压缩率必然不同
    • 加载数据时,加载压缩过的数据耗时是原始数据的4-8倍,如果是稀疏矩阵,解压缩耗时必然不同

    结论:

    • 对于偏大的稀疏矩阵,且对存储空间敏感,使用压缩方式存储是值得一试的方式
    import os
    import os.path as osp
    import numpy as np
    import time
    
    # - check cost time for func()
    
    def check_time(desc, func, run_times=10):
        t = time.time()
        for i in range(run_times):
            func()
        t = (time.time()-t)*1000/run_times
        print('%s cost avg time = %.2f ms' % (desc, t))
        return t
    
    # - big and small ndarray
    big = np.random.randint(0, 10, size=(10000,10000))
    small = np.random.randint(0, 10, size=(640,480))
    
    print('big =', big)
    print('small =', small)
    
    big = [[0 3 9 ... 7 3 2]
     [9 5 9 ... 5 8 7]
     [2 5 6 ... 3 6 9]
     ...
     [3 6 0 ... 6 0 1]
     [8 0 6 ... 5 1 1]
     [7 0 1 ... 7 7 7]]
    small = [[3 7 8 ... 1 4 1]
     [6 1 0 ... 2 1 1]
     [0 7 5 ... 4 3 9]
     ...
     [3 5 4 ... 7 2 2]
     [6 3 1 ... 4 5 9]
     [3 1 9 ... 5 2 5]]
    
    # - npy and npz filename
    big_npy_filename = 'big_npy.npy'
    big_npz_filename = 'big_npz.npz'
    big_compressed_npz_filename = 'big_compressed.npz'
    
    small_npy_filename = 'small_npy.npy'
    small_npz_filename = 'small_npz.npz'
    small_compressed_npz_filename = 'small_compressed.npz'
    
    # - save functions
    
    def test_save_big_npy():
        np.save(big_npy_filename, big)
    
    def test_save_big_npz():
        np.savez(big_npz_filename, big)
    
    def test_save_big_compressed_npz():
        np.savez_compressed(big_compressed_npz_filename, big)
    
    def test_save_small_npy():
        np.save(small_npy_filename, small)
    
    def test_save_small_npz():
        np.savez(small_npz_filename, small)
        
    def test_save_small_compressed_npz():
        np.savez_compressed(small_compressed_npz_filename, small)
    
    # - load functions
    
    def test_load_big_npy():
        return np.load(big_npy_filename)
    
    def test_load_big_npz():
        return np.load(big_npz_filename)['arr_0']
    
    def test_load_big_compressed_npz():
        return np.load(big_compressed_npz_filename)['arr_0']
    
    def test_load_small_npy():
        return np.load(small_npy_filename)
    
    def test_load_small_npz():
        return np.load(small_npz_filename)['arr_0']
        
    def test_load_small_compressed_npz():
        return np.load(small_compressed_npz_filename)['arr_0']
    
    # - check save time for big
    
    check_time('save big npy', test_save_big_npy)
    check_time('save big npz', test_save_big_npz)
    check_time('save big compressed npz', test_save_big_compressed_npz)
    
    for f in [
        big_npy_filename, 
        big_npz_filename, 
        big_compressed_npz_filename
    ]:
        print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))
    
    save big npy cost avg time = 550.91 ms
    save big npz cost avg time = 970.71 ms
    save big compressed npz cost avg time = 36123.80 ms
    file big_npy.npy size = 390625.12 KB
    file big_npz.npz size = 390625.24 KB
    file big_compressed.npz size = 63423.66 KB
    
    # - check load time for big
    
    check_time('load big npy', test_load_big_npy)
    check_time('load big npz', test_load_big_npz)
    check_time('load big compressed npz', test_load_big_compressed_npz)
    
    load big npy cost avg time = 488.87 ms
    load big npz cost avg time = 1162.00 ms
    load big compressed npz cost avg time = 2158.33 ms
    
    
    
    
    
    2158.3264589309692
    
    # - check save time for small
    
    check_time('save small npy', test_save_small_npy)
    check_time('save small npz', test_save_small_npz)
    check_time('save small compressed npz', test_save_small_compressed_npz)
    
    for f in [
        small_npy_filename, 
        small_npz_filename, 
        small_compressed_npz_filename
    ]:
        print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))
    
    save small npy cost avg time = 1.70 ms
    save small npz cost avg time = 7.98 ms
    save small compressed npz cost avg time = 146.19 ms
    file small_npy.npy size = 1200.12 KB
    file small_npz.npz size = 1200.24 KB
    file small_compressed.npz size = 195.20 KB
    
    # check load time for small
    
    check_time('load small npy', test_load_small_npy)
    check_time('load small npz', test_load_small_npz)
    check_time('load small compressed npz', test_load_small_compressed_npz)
    
    load small npy cost avg time = 1.80 ms
    load small npz cost avg time = 6.28 ms
    load small compressed npz cost avg time = 12.97 ms
    
    
    
    
    
    12.965798377990723
  • 相关阅读:
    Java IO编程中的几个概念
    java强转与继承关系的加深理解:object[]的数组无法强转为String[]的数组
    java反射机制获取对象中父类属性对象
    intealij idea中报错:Error during artifact deployment. See server log for details
    自定义数据属性
    字符集属性
    HTMLDocument的变化
    动态添加对象子对象,防止命名冲突
    焦点管理
    HTML5与相关类的扩充
  • 原文地址:https://www.cnblogs.com/journeyonmyway/p/12524425.html
Copyright © 2011-2022 走看看