Code Sample, a copy-pastable example if possible

df = pd.DataFrame([[1,2,3,4]], columns=[5,6,7,8])
df.to_json('test.json', orient='table')
pd.read_json('test.json', orient='table')

KeyError: '[5 6 7 8] not in index'

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: 16d02621212cd7d83f89533265fc039df1d7be10 python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 17.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.23.0.dev0+79.g16d026212 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.5.0.post20170921 Cython: 0.26.1 numpy: 1.13.3 scipy: 1.0.0 pyarrow: 0.8.0 xarray: 0.10.0 IPython: 6.2.1 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: 0.4.0 matplotlib: 2.1.1 openpyxl: 2.5.0b1 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.1.13 pymysql: 0.7.11.None psycopg2: None jinja2: 2.10 s3fs: 0.1.2 fastparquet: 0.1.3 pandas_gbq: None pandas_datareader: None

Comment From: TomAugspurger

I thought that https://frictionlessdata.io/specs/table-schema/ required the field names to be strings, but that may not be the case.

Comment From: robmarkcole

There is an error with string column/index names too:

df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                  index=['row 1', 'row 2'],
                  columns=['col 1', 'col 2'])

df.to_json(orient='table')
pd.read_json(_, orient='table')
...
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

This issue should be renamed to read_json and orient='table' Fails

Comment From: WillAyd

@robmarkcole what version are you using? Your example ran fine for me on master.

INSTALLED VERSIONS ------------------ commit: 28e7a9498457e8cccc105a6261958197889325fa python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.23.0.dev0+487.g28e7a9498 pytest: 3.4.1 pip: 9.0.1 setuptools: 38.5.1 Cython: 0.27.3 numpy: 1.14.1 scipy: 1.0.0 pyarrow: 0.8.0 xarray: 0.10.0 IPython: 6.2.1 sphinx: 1.7.0 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: 0.4.0 matplotlib: 2.1.2 openpyxl: 2.5.0 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.1 pymysql: 0.8.0 psycopg2: None jinja2: 2.10 s3fs: 0.1.3 fastparquet: 0.1.4 pandas_gbq: None pandas_datareader: None

Comment From: robmarkcole

@WillAyd error on 0.22.0. No error on 0.23.0.dev0

Comment From: WillAyd

General support for read_json with table='orient' was only just added for the v0.23 release so makes sense it doesn't work on 0.22. See #19039

Comment From: MichaMucha

Hi! Captain Obvious here, just wanted to say getting this in 0.23.0 as well

import pandas as pd
breaking_case = pd.DataFrame({
    1: [1,2], 
    2: [3,4]}
)
pd.read_json(breaking_case.to_json(orient='table'), orient='table')

KeyError: '[1 2] not in index'

Comment From: WillAyd

@MichaMucha that's right this was never implemented, though not sure if it's valid JSON either. If interested investigation into the schema linked above to confirm or deny is welcome!

Comment From: MichaMucha

Thanks for the quick reply!

I read through the page and it seems that: - a descriptor specifying columns is an "ordered dict" equivalent - order of column descriptions in that dict implies column order - each column is described by a few properties, one of those is name. - technically speaking, { "name" : 1 } is valid JSON

I would reason that you can have numerical column names.

direct quotes from the spec (a "field" is a column in our case): A Table Schema is represented by a descriptor. The descriptor MUST be a JSON object (JSON is defined in RFC 4627). It MUST contain a property fields. fields MUST be an array where each entry in the array is a field descriptor (as defined below). The order of elements in fields array MUST be the order of fields in the CSV file. A field descriptor MUST be a JSON object that describes a single field.

Let me know what can I do if I can help more

Comment From: WillAyd

Thanks for the review! The table implementation is located in pandas/pandas/io/json/table_schema.py so if you poke around there you should see where this could be implemented. Assuming this is your first time, also be sure to read through the contributing guide:

https://pandas.pydata.org/pandas-docs/stable/contributing.html

Hope that helps but let me know if you have any questions

Comment From: MichaMucha

Thank you! Sorry took me a while to find time. Thanks for the links! I looked at the table implementation. Turns out table_schema.py is not to blame! The assignment happens at line 96 and leaves the type intact.

I dug a little deeper and it seems that this guy over here - pandas/pandas/io/json/json.py:214 - JSONTableWriter._write wants to write data in a row-oriented fashion. Every row becomes a dict, this makes every column a key, and JSON needs keys to be strings. When you read it back, you get strings obviously.

One workaround I can think about is to match field['name'] to the stringified columns found in datahere, and cast them to the type of field['name']. This will cause ambiguity trouble if you have a data frame with a column "1" and 1 for example.

Another solution I can imagine is an error asking the user to provide string columns.

Let me know what you think. Thanks again for the contributing guide!

Comment From: WillAyd

@MichaMucha IIUC that supports the argument that numeric column names should not be allowed, given the column names are the keys in the JSON table schema and JSON keys need to be strings.

If that's the case and you are looking to contribute then I'd suggest perhaps raising a more descriptive ValueError when trying to write to the JSON table schema using non-string column names

Comment From: albertvillanova

@WillAyd I think the problem is with the JSON serialization of the DataFrame (the 'data' for orient='table') and not with 'schema' (TableSchema spec).

IIUC, the aim of orient='table' is to make round-trip JSON serialization-deserialization of pandas objects. As pandas DataFrame can have non-string column names (indeed, that is the default if column names are not passed explicitly at instantiation), then the column names SHOULD NOT be used as keys in the JSON object (JSON spec imposes that keys must be strings).

It is also the case for index names: they can be non-strings. Therefore, index names should not be used as keys in the JSON object.

Comment From: WillAyd

19129 should already cover that just needs community PR

Comment From: albertvillanova

Yes, but I was suggesting not just "raising a more descriptive ValueError" (sic), but changing the implementation of the JSON serialization for orient='table'.

Pandas Raise ValueError for read_json and orient='table' With Numeric Column Names

Code Sample, a copy-pastable example if possible

Output of pd.show_versions()

19129 should already cover that just needs community PR

Output of `pd.show_versions()`