zoukankan html css js c++ java

[笔记] numpy保存文件的耗时记录

简单统计了一下numpy在保存数据到文件时几种方式的耗时。

所用的数据有两个，一个是10000x10000的大矩阵，一个是640x480的小矩阵，分别查看在大数据和小数据上保存和加载的表现。

保存方式有三种：

np.save()：直接将对象dump为二进制文件，无压缩，文件大
np.savez()：可同时保存多个对象，加载时通过字典读取，无压缩，文件大
np.savez_compressed()：将np.savez()的结果进行压缩，文件小

运行环境：Win10 64bit，Python 3.7

测试数据显示：

大矩阵的保存：
np.save()：耗时 550.91 ms, 文件大小 390625.12 KB
np.savez()：耗时 970.71 ms, 文件大小 390625.24 KB
np.savez_compressed()：耗时 36123.80 ms, 是np.save()的65.6倍，文件大小 63423.66 KB，压缩率 6.16
大矩阵的加载，都是np.load()，耗时分别为 488.87 ms, 1162.00 ms, 2158.33 ms。加载压缩数据的时间，是无压缩数据时间的4.4倍。
小矩阵的保存：
np.save()：耗时 1.70 ms, 文件大小 1200.12 KB
np.savez()：耗时 7.98 ms, 文件大小 1200.24 KB
np.savez_compressed()：耗时 146.19 ms, 是np.save()的86倍，文件大小 195.20 KB，压缩率 6.15
小矩阵的加载，都是np.load()，耗时分别为 1.80 ms，6.28 ms，12.97 ms。加载压缩数据的时间，是无压缩数据时间的7.2倍。

由此可见：

np.savez()因为有字典操作，所以耗时比np.save()会增加
np.savez_compressed()有压缩操作，所以耗时比np.save()大60-90倍，对于随机数据，压缩率在6左右，如果是稀疏矩阵，压缩耗时及压缩率必然不同
加载数据时，加载压缩过的数据耗时是原始数据的4-8倍，如果是稀疏矩阵，解压缩耗时必然不同

结论：

对于偏大的稀疏矩阵，且对存储空间敏感，使用压缩方式存储是值得一试的方式

import os
import os.path as osp
import numpy as np
import time

# - check cost time for func()

def check_time(desc, func, run_times=10):
    t = time.time()
    for i in range(run_times):
        func()
    t = (time.time()-t)*1000/run_times
    print('%s cost avg time = %.2f ms' % (desc, t))
    return t

# - big and small ndarray
big = np.random.randint(0, 10, size=(10000,10000))
small = np.random.randint(0, 10, size=(640,480))

print('big =', big)
print('small =', small)

big = [[0 3 9 ... 7 3 2]
 [9 5 9 ... 5 8 7]
 [2 5 6 ... 3 6 9]
 ...
 [3 6 0 ... 6 0 1]
 [8 0 6 ... 5 1 1]
 [7 0 1 ... 7 7 7]]
small = [[3 7 8 ... 1 4 1]
 [6 1 0 ... 2 1 1]
 [0 7 5 ... 4 3 9]
 ...
 [3 5 4 ... 7 2 2]
 [6 3 1 ... 4 5 9]
 [3 1 9 ... 5 2 5]]

# - npy and npz filename
big_npy_filename = 'big_npy.npy'
big_npz_filename = 'big_npz.npz'
big_compressed_npz_filename = 'big_compressed.npz'

small_npy_filename = 'small_npy.npy'
small_npz_filename = 'small_npz.npz'
small_compressed_npz_filename = 'small_compressed.npz'

# - save functions

def test_save_big_npy():
    np.save(big_npy_filename, big)

def test_save_big_npz():
    np.savez(big_npz_filename, big)

def test_save_big_compressed_npz():
    np.savez_compressed(big_compressed_npz_filename, big)

def test_save_small_npy():
    np.save(small_npy_filename, small)

def test_save_small_npz():
    np.savez(small_npz_filename, small)
    
def test_save_small_compressed_npz():
    np.savez_compressed(small_compressed_npz_filename, small)

# - load functions

def test_load_big_npy():
    return np.load(big_npy_filename)

def test_load_big_npz():
    return np.load(big_npz_filename)['arr_0']

def test_load_big_compressed_npz():
    return np.load(big_compressed_npz_filename)['arr_0']

def test_load_small_npy():
    return np.load(small_npy_filename)

def test_load_small_npz():
    return np.load(small_npz_filename)['arr_0']
    
def test_load_small_compressed_npz():
    return np.load(small_compressed_npz_filename)['arr_0']

# - check save time for big

check_time('save big npy', test_save_big_npy)
check_time('save big npz', test_save_big_npz)
check_time('save big compressed npz', test_save_big_compressed_npz)

for f in [
    big_npy_filename, 
    big_npz_filename, 
    big_compressed_npz_filename
]:
    print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))

save big npy cost avg time = 550.91 ms
save big npz cost avg time = 970.71 ms
save big compressed npz cost avg time = 36123.80 ms
file big_npy.npy size = 390625.12 KB
file big_npz.npz size = 390625.24 KB
file big_compressed.npz size = 63423.66 KB

# - check load time for big

check_time('load big npy', test_load_big_npy)
check_time('load big npz', test_load_big_npz)
check_time('load big compressed npz', test_load_big_compressed_npz)

load big npy cost avg time = 488.87 ms
load big npz cost avg time = 1162.00 ms
load big compressed npz cost avg time = 2158.33 ms





2158.3264589309692

# - check save time for small

check_time('save small npy', test_save_small_npy)
check_time('save small npz', test_save_small_npz)
check_time('save small compressed npz', test_save_small_compressed_npz)

for f in [
    small_npy_filename, 
    small_npz_filename, 
    small_compressed_npz_filename
]:
    print('file %s size = %.2f KB' % (f, osp.getsize(f)/1024))

save small npy cost avg time = 1.70 ms
save small npz cost avg time = 7.98 ms
save small compressed npz cost avg time = 146.19 ms
file small_npy.npy size = 1200.12 KB
file small_npz.npz size = 1200.24 KB
file small_compressed.npz size = 195.20 KB

# check load time for small

check_time('load small npy', test_load_small_npy)
check_time('load small npz', test_load_small_npz)
check_time('load small compressed npz', test_load_small_compressed_npz)

load small npy cost avg time = 1.80 ms
load small npz cost avg time = 6.28 ms
load small compressed npz cost avg time = 12.97 ms





12.965798377990723

查看全文

相关阅读:
将16进制的颜色转为rgb颜色
 css3选择
 css写复选框
 关于瀑布流
 关于CSS3属性transition的触发
 单行文本两端对齐
 jQuery插件——下拉选择框
 CSS3帧动画
 Vuejs自定义全局组件--loading
Sublime text开发Quick-Cocos2d-x-3.x环境搭建（Windows）

原文地址：https://www.cnblogs.com/journeyonmyway/p/12524425.html