Code Sample
import pandas as pd
df = pd.DataFrame({'test': ['34343_43434']})
json = df.to_json(orient='records')
result = pd.read_json(json, orient='records')
print(result)
output: test 0 3434343434
Problem description
Pandas appears to be converting the initial string ("34343_43434") to an integer (3434343434) and removes the underscore to do it.
This only occurs when all characters in the string (besides the underscore) are integers. For example, if the initial value were "34343_43434X" then the output would correctly "34343_43434X". This issue does not occur when dtypes=False.
Expected Output
test
0 34343_43434
Output of pd.show_versions()
[paste the output of ``pd.show_versions()`` here below this line]
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
Comment From: bobhaffner
I was able to replicate this on python 3.6.2 but not on 3.5.3. Not sure why though
Comment From: bobhaffner
MIght be related to PEP 515
I'm able to do things like this in 3.6, but not in 3.5
In [5]: num = 34343_43434
In [6]: type(num)
Out[6]: int
In [7]: num
Out[7]: 3434343434
Comment From: jreback
this is an unfortunate side effect of the pep, but this looks valid to me.