Pandas read_excel dtype should supersede convert_float

If a float dtype is explicitly specified for a column, none of the values in the column should be converted to integers. Currently, the user must specify convert_float=False to prevent potential conversion to int. This may lead to unexpected behaviour if the user is unaware of the default True for convert_float.

Comment From: chris-b1

Can you please provide a reproducible example? I just tried the below and seems to work?

In [11]: df = pd.DataFrame({'a': [1, 2, 3]})

In [12]: df.to_excel('tmp.xlsx', index=False)

In [13]: pd.read_excel('tmp.xlsx').dtypes
Out[13]: 
a    int64
dtype: object

In [14]: pd.read_excel('tmp.xlsx', dtype={'a': float}).dtypes
Out[14]: 
a    float64
dtype: object

Comment From: pzakielarz

After playing around trying to construct a MWE, I agree, it usually works. The following code recreates the issue, but I suspect something is causing the dtype kwarg to be ignored.

In [2]: df = pd.DataFrame([['bookmark','A'],
   ...:                    ['data',1],
   ...:                    ['data',2.3]], columns=['record_type','record'])
   ...:                    

In [3]: df
Out[3]: 
  record_type record
0    bookmark      A
1        data      1
2        data    2.3

In [4]: df.to_excel('tmp.xlsx', index=False)

In [5]: df = pd.read_excel('tmp.xlsx', names=['record_type','record'],
   ...:                    dtype={'record':float}, skiprows=1, header=None)
   ...:                    

In [6]: df
Out[6]: 
  record_type record
0    bookmark      A
1        data      1
2        data    2.3

In [7]: df.record.apply(type)
Out[7]: 
0      <class 'str'>
1      <class 'int'>
2    <class 'float'>
Name: record, dtype: object

In [8]: df = pd.read_excel('tmp.xlsx', names=['record_type','record'],
   ...:                    dtype={'record':float}, skiprows=1, header=None, convert_float=False)
   ...:                    
   ...:                    

In [9]: df.record.apply(type)
Out[9]: 
0      <class 'str'>
1    <class 'float'>
2    <class 'float'>
Name: record, dtype: object

If you don't specify names, skiprows, and header when reading, you get a ValueError when it attempts to convert "A" to a float. However, the above example runs without raising an exception.

Comment From: jreback

this is as expected. dtype doesn't force a type on something that cannot be converted. you should generally use pd.to_numeric if you want to control this (after reading).

Comment From: pzakielarz

@jreback Thanks for the info on to_numeric. Can you clarify if you're saying dtype inconsistently raising ValueError when the column cannot be converted is expected?

This causes a ValueError

pd.read_excel('tmp.xlsx', names=['record_type', 'records'], dtype={1:float},
                       skiprows=1, header=None)

but this does not

pd.read_excel('tmp.xlsx', names=['record_type', 'records'], dtype={'records':float},
                       skiprows=1, header=None)

Obviously this isn't the original issue, but is what lead to my confusion. I can file a new bug report if you agree that the inconsistency is not intended.