python - Creating a dask dataframe from a list of HDF5 files -


what correct way go creating dask.dataframe list of hdf5 files? want dataframe

dsets = [h5py.file(fn)['/data'] fn in sorted(glob('myfiles.*.hdf5')] arrays = [da.from_array(dset, chunks=(1000, 1000)) dset in dsets] x = da.stack(arrays, axis=0) 

briefly if individual files can read pd.read_hdf can dd.read_hdf , dd.concat.

import dask.dataframe dd dfs = [dd.read_hdf(fn, '/data') fn in sorted(glob('myfiles.*.hdf5')] df = dd.concat(dfs) 

but useful (and easy) support idiom within dd.read_hdf directly. i've created an issue , try in next couple of days.


Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -