python - Creating a dask dataframe from a list of HDF5 files -
what correct way go creating dask.dataframe list of hdf5 files? want dataframe
dsets = [h5py.file(fn)['/data'] fn in sorted(glob('myfiles.*.hdf5')] arrays = [da.from_array(dset, chunks=(1000, 1000)) dset in dsets] x = da.stack(arrays, axis=0)
briefly if individual files can read pd.read_hdf
can dd.read_hdf
, dd.concat
.
import dask.dataframe dd dfs = [dd.read_hdf(fn, '/data') fn in sorted(glob('myfiles.*.hdf5')] df = dd.concat(dfs)
but useful (and easy) support idiom within dd.read_hdf
directly. i've created an issue , try in next couple of days.
Comments
Post a Comment