Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df = pd.DataFrame({'year':[1990.0, 1975.0,np.nan, 1990.0]})
print(df['year'].astype(object))
0 1990
1 1975
2 NaN
3 1990
Name: year, dtype: object
df['year']=df['year'].astype(object)
df.to_csv('test.csv')
Then the csv file looks like
,year
0,1990.0
1,1975.0
2,
3,1990.0
Problem description
Converting a dataframe containing years (written as floats appended by .0 and some NaN) to object type. The pandas output, as well as its representation then neglects the decimal point and only returns the years as int and NaN where necessary. However, the csv output still contains the decimal point followed by 0
Expected Output
A csv file of the similar to the first output shown above:
,year
0,1990
1,1975
2,
3,1990
Output of pd.show_versions()
# Paste the output here pd.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None
Comment From: jreback
This is just how object
dtype display, it is not very intelligent about non-object (strings). holding floats in an object
dtype is non-idiomatic, and non-performant.
This in fact still holds a float object. Very few options then apply when going to_csv
In [20]: df.iloc[0,0]
Out[20]: 1990.0