If a float dtype
is explicitly specified for a column, none of the values in the column should be converted to integers. Currently, the user must specify convert_float=False
to prevent potential conversion to int. This may lead to unexpected behaviour if the user is unaware of the default True
for convert_float
.
Comment From: chris-b1
Can you please provide a reproducible example? I just tried the below and seems to work?
In [11]: df = pd.DataFrame({'a': [1, 2, 3]})
In [12]: df.to_excel('tmp.xlsx', index=False)
In [13]: pd.read_excel('tmp.xlsx').dtypes
Out[13]:
a int64
dtype: object
In [14]: pd.read_excel('tmp.xlsx', dtype={'a': float}).dtypes
Out[14]:
a float64
dtype: object
Comment From: pzakielarz
After playing around trying to construct a MWE, I agree, it usually works. The following code recreates the issue, but I suspect something is causing the dtype
kwarg to be ignored.
In [2]: df = pd.DataFrame([['bookmark','A'],
...: ['data',1],
...: ['data',2.3]], columns=['record_type','record'])
...:
In [3]: df
Out[3]:
record_type record
0 bookmark A
1 data 1
2 data 2.3
In [4]: df.to_excel('tmp.xlsx', index=False)
In [5]: df = pd.read_excel('tmp.xlsx', names=['record_type','record'],
...: dtype={'record':float}, skiprows=1, header=None)
...:
In [6]: df
Out[6]:
record_type record
0 bookmark A
1 data 1
2 data 2.3
In [7]: df.record.apply(type)
Out[7]:
0 <class 'str'>
1 <class 'int'>
2 <class 'float'>
Name: record, dtype: object
In [8]: df = pd.read_excel('tmp.xlsx', names=['record_type','record'],
...: dtype={'record':float}, skiprows=1, header=None, convert_float=False)
...:
...:
In [9]: df.record.apply(type)
Out[9]:
0 <class 'str'>
1 <class 'float'>
2 <class 'float'>
Name: record, dtype: object
If you don't specify names
, skiprows
, and header
when reading, you get a ValueError when it attempts to convert "A" to a float. However, the above example runs without raising an exception.
Comment From: jreback
this is as expected. dtype
doesn't force a type on something that cannot be converted. you should generally use pd.to_numeric
if you want to control this (after reading).
Comment From: pzakielarz
@jreback Thanks for the info on to_numeric
. Can you clarify if you're saying dtype
inconsistently raising ValueError
when the column cannot be converted is expected?
This causes a ValueError
pd.read_excel('tmp.xlsx', names=['record_type', 'records'], dtype={1:float},
skiprows=1, header=None)
but this does not
pd.read_excel('tmp.xlsx', names=['record_type', 'records'], dtype={'records':float},
skiprows=1, header=None)
Obviously this isn't the original issue, but is what lead to my confusion. I can file a new bug report if you agree that the inconsistency is not intended.