问题现象:
最近在用pandas分析数据时,用hdf5存储结果,当我监听不同文件时,多个进程同时写入hdf5(save到不同group)时,报hdf5 AttributeError: 'UnImplemented' object has no attribute 'read',之后用ipython去查数据,报了以下错误
HDF5ExtError: HDF5 error back trace
File "H5Dio.c", line 173, in H5Dread
can't read data
File "H5Dio.c", line 554, in H5D__read
can't read data
File "H5Dchunk.c", line 1875, in H5D__chunk_read
unable to read raw data chunk
File "H5Dchunk.c", line 2905, in H5D__chunk_lock
data pipeline read failed
File "H5Z.c", line 1372, in H5Z_pipeline
filter returned failure during read
End of HDF5 error back trace
Problems reading records.
解决方案:
利用文件锁(fcntl模块)来保证数据的同步。
后来买了本书《Python和HDF5大数据应用》,里面介绍可以用mpio驱动来处理多进程的问题