Pandas read_json ValueError: Value is too big

Loading a json file with large integers (> 2^32), results in "Value is too big". I have tried changing the orient to "records" and also passing in dtype={'id': numpy.dtype('uint64')}. The error is the same.

import pandas
data = pandas.read_json('''{"id": 10254939386542155531}''')
print(data.describe())

Expected Output

                          id
count                      1
unique                     1
top     10254939386542155531
freq                       1

Actual Output (even with dtype passed in)

 File "./parse_dispatch_table.py", line 34, in <module>
    print(pandas.read_json('''{"id": 10254939386542155531}''', dtype=dtype_conversions).describe())
  File "/users/XXX/.local/lib/python3.4/site-packages/pandas/io/json.py", line 234, in read_json
    date_unit).parse()
  File "/users/XXX/.local/lib/python3.4/site-packages/pandas/io/json.py", line 302, in parse
    self._parse_no_numpy()
  File "/users/XXX/.local/lib/python3.4/site-packages/pandas/io/json.py", line 519, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Value is too big

No problem using read_csv:

import pandas
import io
print(pandas.read_csv(io.StringIO('''id\n10254939386542155531''')).describe())

Output using read_csv

                          id
count                      1
unique                     1
top     10254939386542155531
freq                       1

Output of `pd.show_versions()`

## INSTALLED VERSIONS commit: None python: 3.4.3.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-327.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.0 nose: None pip: 8.1.2 setuptools: 28.6.0 Cython: None numpy: 1.11.2 scipy: None statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.7 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None

Comment From: jreback

that's not valid JSON. your numbers should be quoted if they are in fact not numbers. That is out of range and should raise.

In [34]: pd.read_json('''{"id": "10254939386542155531"}''', dtype=object, orient='record', typ='series')
Out[34]: 
id    10254939386542155531
dtype: object

Comment From: jxramos

I'm bumping up into this same issue too where a 64bit integer is being used as an id. Any workaround for overriding? Would have been nice if the dtype specification drove an override but type coercion must occur after default inferred type loading

This comes up during a system log archive collection in MacOS High Sierra executing from a bash shell that is later rendered to text with a json styling...

log collect
log show --style json  > ~/syslogarchive.json


python
>import pandas
>dfSysLog = pandas.read_json( '~/syslogarchive.json' )
...
ValueError: Value is too big

Expected Output

Actual Output (even with dtype passed in)

Output using read_csv

Output of pd.show_versions()

Output of `pd.show_versions()`