Hello, I'm not sure if it is an intended behavior or not, and I did not find any mention about this in the documentation or in the github issue tracker. I'm filing it - just in case it was not planned to work this way.

Problem description

On save to HDF5 file RangeIndex of pandas.DataFrame is converted to Int64Index (which could add quite some to the stored space for the long tables).

df = pd.DataFrame(np.random.randn(1000,2))
df.index

results in RangeIndex(start=0, stop=1000, step=1)

Then

df.to_hdf('tmp.h5', 'df')
df = pd.read_hdf('tmp.h5', 'df')
df.index

results in Int64Index([ 0, 1, ..., 999], dtype='int64', length=1000)

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.22.0 pytest: 3.3.2 pip: 9.0.1 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.14.1 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.1.2 openpyxl: 2.4.10 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: max-sixty

Is there a more efficient way of representing a range in HDF5?

Comment From: jreback

duplicated of https://github.com/pandas-dev/pandas/issues/8319

its not worth it trying to finese, this, rather just have an option to turn it off

Comment From: jreback

PR's to fix are welcome!