We're inconsistent with excluding nuisance columns
In [37]: df = pd.DataFrame({
...: "A": [1, 2, 3, 4],
...: "B": pd.interval_range(0, 4),
...: })
In [38]: df.dtypes
Out[38]:
A int64
B object
dtype: object
In [39]: df.sum()
Out[39]:
A 10
dtype: int64
In [40]: df.cumsum()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-1c578612e53e> in <module>()
----> 1 df.cumsum()
~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/generic.py in cum_func(self, axis, skipna, *args, **kwargs)
7781 mask = isna(self)
7782 np.putmask(y, mask, mask_a)
-> 7783 result = accum_func(y, axis)
7784 np.putmask(result, mask, mask_b)
7785 else:
~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/generic.py in <lambda>(y, axis)
7400 cls.cumsum = _make_cum_function(
7401 cls, 'cumsum', name, name2, axis_descr, "cumulative sum",
-> 7402 lambda y, axis: y.cumsum(axis), "sum", 0., np.nan)
7403 cls.cumprod = _make_cum_function(
7404 cls, 'cumprod', name, name2, axis_descr, "cumulative product",
TypeError: unsupported operand type(s) for +: 'pandas._libs.interval.Interval' and 'pandas._libs.interval.Interval'
I'll update this with more dtypes (datetime, period, categorical) and more aggregations. Ideally we'd be consistent between the cumulative and non-cumulative versions.
Comment From: tdpetrou
Also very closely related is accumulation with missing values.
>>> df = pd.DataFrame({'a': [1, np.nan, 5]})
>>> df.cumsum()
a
0 1.0
1 NaN
2 6.0
Couldn't this be:
>>> df.cumsum()
a
0 1.0
1 1.0
2 6.0
Also, would cumsum of all nan column be 0 for all values?
Comment From: jbrockmendel
Added a GH label for nuisance columns
Comment From: jbrockmendel
Closing as we have removed silent dropping of nuisance columns for reductions.