Hello, I'm not sure if it is an intended behavior or not, and I did not find any mention about this in the documentation or in the github issue tracker. I'm filing it - just in case it was not planned to work this way.
Problem description
On save to HDF5 file RangeIndex
of pandas.DataFrame is converted to Int64Index
(which could add quite some to the stored space for the long tables).
df = pd.DataFrame(np.random.randn(1000,2))
df.index
results in RangeIndex(start=0, stop=1000, step=1)
Then
df.to_hdf('tmp.h5', 'df')
df = pd.read_hdf('tmp.h5', 'df')
df.index
results in Int64Index([ 0, 1, ..., 999], dtype='int64', length=1000)
Output of pd.show_versions()
Comment From: max-sixty
Is there a more efficient way of representing a range in HDF5?
Comment From: jreback
duplicated of https://github.com/pandas-dev/pandas/issues/8319
its not worth it trying to finese, this, rather just have an option to turn it off
Comment From: jreback
PR's to fix are welcome!