Code Sample, a copy-pastable example if possible

df = pd.DataFrame(np.random.randint(0, high=2**16 -1, size=(20,5), dtype=np.uint16),
                  columns=['a', 'b', 'c', 'd', 'e'])
df = df.set_index(['a', 'b'])
store = pd.HDFStore('test.h5')
store.append('test', df, append=True)
>>>NotImplementedError: indexing 64-bit unsigned integer columns is not supported yet, sorry

Problem description

When using a MultiIndex, pandas is coercing the uint16 columns to uint64. Then when I try to write that to an H5 file (in table format), the NotImplementedError is raised by pytables.

Expected Output

Expect H5 file to be created, with pandas coercing types as required (in this case uint16 to int64 is safe). If using a UInt64Index (without a MultiIndex), pandas does coerce correctly:

df = pd.DataFrame(np.random.randint(0, high=2**16 -1, size=(20,5), dtype=np.uint16),
                  columns=['a', 'b', 'c', 'd', 'e'])
df = df.set_index(['a'])
df.index.dtype
>>>dtype('uint64')
store = pd.HDFStore('test.h5')
store.append('test2', df, append=True)
store.test2.index.dtype
>>>dtype('int64')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.3 pytest: 3.0.5 pip: 8.1.2 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: 0.9.6 IPython: 5.1.0 sphinx: 1.4.8 patsy: None dateutil: 2.5.3 pytz: 2016.7 blosc: None bottleneck: None tables: 3.4.2 numexpr: 2.6.2 feather: 0.4.0 matplotlib: 2.0.2 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: gfyoung

@kylekeppler : Thanks for reporting this! I think you should file an issue with pytables, as we've been spending a lot of time trying to bulk up support for uint64. However, it does seem like this is an issue across many libraries, who seem to stop at int64.

As a workaround, I think you should cast to float, though that is going to destroy precision.

Comment From: kylekeppler

@gfyoung: Agree this would be nice to be fixed in pytables but this NotImplementedError has been thrown since at least version 2.3 from 2011, so it doesn't look like they are in any hurry to fix that.

In my case at least it made sense to cast to int64 manually as is done without a MultiIndex. I'd say pandas should do that levels of a MulitIndex as well.

Comment From: gfyoung

this NotImplementedError has been thrown since at least version 2.3 from 2011, so it doesn't look like they are in any hurry to fix that.

I'm not sure I follow you here. The issue may have existed for some time because no one has asked about it. We have had the same issue as well in pandas and only began patching it when people started asking about it. I would suggest that you file an issue in pytables and see how they respond.

On our end, I would be hesitant to cast to int64 (or perform any downcasting acrobatics) just for the sake of accommodating another library. We don't like to destroy dtype if possible. That being said, if we were to do that, I guess we could always cast to float64 only in cases when there are elements with values greater than 2**63 - 1 and cast to an int* dtype otherwise.

However, I'm not sure. @jreback ? @jorisvandenbossche ?

Comment From: kylekeppler

@gfyoung, agree with your comments. I am only proposing that the Uint64Index case and the MultiIndex with an uint64 level behave the same.

Comment From: jreback

So this works fine to store your data. The issue is that you are trying to actually index on the columns (which is what happens when you store as a MultiIndex, these columns become indexer).

This is good

In [7]: df.reset_index().to_hdf('test.h5', 'df', mode='w', format='table')

In [8]: pd.read_hdf('test.h5', 'df').dtypes
Out[8]: 
a    uint64
b    uint64
c    uint16
d    uint16
e    uint16
dtype: object

If you try to index.

In [9]: df.reset_index().to_hdf('test2.h5', 'df', mode='w', format='table', data_columns=True)
NotImplementedError: indexing 64-bit unsigned integer columns is not supported yet, sorry

So not really sure what pandas can do about this. I would simply not try to index using uint64 columns until the support is there.

closing as won't fix. You should open an issue on the pytables tracker.