Code Sample
import pandas as pd
import numpy as np
import StringIO
tmp = StringIO.StringIO(buf="id\n" + "9"*19 + "\n")
pd.read_table(tmp, dtype={'id': np.uint64})
Problem description
A string of 19 nines is about 2^{63.11} and fits in a np.uint64, e.g., np.array([9999999999999999999])
has dtype uint64. pd.read_table
without the dtype option will simply coerce this column to object
dtype, which is reasonable.
Expected Output
A data frame with dtype np.uint64, or perhaps a warning that read_table
doesn't support uint64.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 32.1.0
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.2.2
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None
Comment From: jreback
This is already fixed in master (but not released yet).
In [4]: import pandas as pd
...: import numpy as np
...: tmp = StringIO("id\n" + "9"*19 + "\n")
...: df = pd.read_table(tmp, dtype={'id': np.uint64})
...:
In [5]: df
Out[5]:
id
0 9999999999999999999
In [6]: df.dtypes
Out[6]:
id uint64
dtype: object
@gfyoung assume we have enough tests for this?
Comment From: gfyoung
@jreback : Yes, there are.