xref #18021, #16832
Code Sample, a copy-pastable example if possible
df = pd.DataFrame({'col1': [0, 1, 2, 3], 'col2': ['a', 'b', None, 'd'], 'col3': ['e', 'f', None, 'h']})
df.min(axis=0, skipna=True)
Problem description
If I've read the documentation correctly I think min should return a series of the same length as the relevant axis of the dataframe, and especially with the skipna flag set as True (which is the default) NA values should be ignored in the calculation.
Output
col1 0 dtype: int64
Expected Output
col1 0 col2 a col3 e dtype: object
Output of pd.show_versions()
Comment From: jreback
so couple of things going on.
.min(numeric_only=None)
by default so non-numerics are ignored (if they raise). But these fail anyhow if executed directly on the column.
In [8]: df.col2.min()
TypeError: '<=' not supported between instances of 'str' and 'float'
so we are not skipping NaN here and passing directly to numpy which raises (would have to patch nanmax
as well);
In [1]: df = pd.DataFrame({'col1': [0, 1, 2, 3], 'col2': ['a', 'b',None, 'd'], 'col3': ['e', 'f', None, 'h']})
In [2]: df.col2.min()
Out[2]: 'a'
In [3]: df.min()
Out[3]:
col1 0
col2 a
col3 e
dtype: object
patch
(pandas) bash-3.2$ git diff
diff --git a/pandas/core/nanops.py b/pandas/core/nanops.py
index e1c09947a..33377002e 100644
--- a/pandas/core/nanops.py
+++ b/pandas/core/nanops.py
@@ -478,6 +478,8 @@ def _nanminmax(meth, fill_value_typ):
except:
result = np.nan
else:
+
+ values = values[~mask]
result = getattr(values, meth)(axis)
result = _wrap_results(result, dtype)
Comment From: jreback
so open to fixing this (PR welcome!) it might break some existing tests.
Comment From: TomAugspurger