-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
In [34]: pd.Series(['1', '2', '3']).median()
Out[34]: 2.0
In [36]: pd.DataFrame({"A": ['1', '2', '3']}).median()
Out[36]:
A 2.0
dtype: float64
Problem description
median
should not convert the input types. We seem to explicitly convert all non-float dtypes in nanmedian
. Do we want to do that?
Expected Output
TypeError or ValueError. Not sure
Comment From: jorisvandenbossche
I think we clearly don't want to do this, for the above case. But I suppose this way it enables numeric values in object dtype in general? And there might be use cases for that which would be broken if we remove this conversion to float .. (eg decimals, etc)
Comment From: TomAugspurger
But I suppose this way it enables numeric values in object dtype in general?
Agreed. At the very least, we'll want to ensure we have a test that something like pd.Series([1, 2], dtype=object).median()
continues to work.
Comment From: jreback
isn’t this the same treatment for min/max
median is an ordering lookup not dependent on the type (except for ties)
Comment From: TomAugspurger
min / max doesn't first convert to float
In [14]: pd.Series(['1', '2']).min()
Out[14]: '1'
mean
does convert to float.
In [15]: pd.Series(['1', '2']).mean()
Out[15]: 6.0
(how does it get 6.0 there though? 😄)
Comment From: jorisvandenbossche
(how does it get 6.0 there though? smile)
Ah boy .. But so it's not only median.
BTW, just checked, the reason for 6 is actually because "1" + "2" is "12", and then we convert to numeric 12, and divide by the count ... :)
Comment From: jreback
i believe we handle object dtype like this to make this work if we happen to have numeric in object type
we should infer in object dtype for a more specific numeric type before performing these ops