This came up while trying to de-duplicate some DataFrame vs Series arithmetic logic in core.ops.
Comment From: gfyoung
That seems consistent with the docstring?
Comment From: jbrockmendel
Sure, but it's not exactly useful behavior.
Comment From: gfyoung
Why not?
Comment From: jbrockmendel
The relevant context is the ongoing effort to ensure that Index/Series/DataFrame operations are consistent with one another. The Index vs Series part is most of the way there, and I'm just starting in on Series vs DataFrame.
In core.ops, Series and DataFrame arithmetic operations have really similar error catching logic:
Series:
except TypeError:
if isinstance(y, (np.ndarray, ABCSeries, pd.Index)):
dtype = find_common_type([x.dtype, y.dtype])
result = np.empty(x.size, dtype=dtype)
mask = notna(x) & notna(y)
result[mask] = op(x[mask], com._values_from_object(y[mask]))
else:
assert isinstance(x, np.ndarray)
result = np.empty(len(x), dtype=x.dtype)
mask = notna(x)
result[mask] = op(x[mask], y)
DataFrame:
except TypeError:
xrav = x.ravel()
if isinstance(y, (np.ndarray, ABCSeries)):
dtype = np.find_common_type([x.dtype, y.dtype], [])
result = np.empty(x.size, dtype=dtype)
yrav = y.ravel()
mask = notna(xrav) & notna(yrav)
xrav = xrav[mask]
if yrav.shape != mask.shape:
# FIXME: GH#5284, GH#5035, GH#19448
# Without specifically raising here we get mismatched
# errors in Py3 (TypeError) vs Py2 (ValueError)
raise ValueError('Cannot broadcast operands together.')
yrav = yrav[mask]
if xrav.size:
with np.errstate(all='ignore'):
result[mask] = op(xrav, yrav)
elif isinstance(x, np.ndarray):
# mask is only meaningful for x
result = np.empty(x.size, dtype=x.dtype)
mask = notna(xrav)
xrav = xrav[mask]
if xrav.size:
with np.errstate(all='ignore'):
result[mask] = op(xrav, y)
Very nearly the only differences here are a) find_common_type
vs np.find_common_type
and a ravel()
. if it were the case that ravel()
were reliably a no-op for the 1-D case in the Series operation, that would go a long way towards letting us de-duplicate some code.
Comment From: gfyoung
Ah, interesting...got it. Well, this is an API change, so I'll open it up to @jreback @jorisvandenbossche . Is there any reason why we always returned ndarray
?
Comment From: jbrockmendel
@TomAugspurger possibly change this behavior in a follow up to PeriodArray?
Comment From: TomAugspurger
Is the expected behavior an ndarray[object] with period object?
This could be done any time by defining PeriodIndex.ravel as self.values.ravel()
instead of inheriting the base implementation.
Comment From: jbrockmendel
For PeriodArray I’d expect it to return a 1-dim-reshaped view on itself. So a no-op for EA since those are 1-dim to begin with.
Comment From: TomAugspurger
That'll need a larger discussion then, since the other EA-backed indexes all return ndarrays for ravel.
Comment From: jorisvandenbossche
Also for Series, ravel
returns an ndarray.
I think there is also something to be said for ravel
always to be an ndarray.
Or to simply deprecate the method, since if you want a ravelled ndarray, you can also do the much more explicit np.asarray(val).ravel()
.
The question is: what do people use it for? It might be that it is used in cases where afterwards the code expects to now have a numpy array.
Comment From: jbrockmendel
The question is: what do people use it for?
The place that motivated this issue is in core.ops
where we currently call ravel() on most things but specifically avoid calling it on PeriodIndex
. It's a pretty ugly special case.
Comment From: jorisvandenbossche
And what's the reason we call it there? What do we want to end up with there?
Comment From: jbrockmendel
And what's the reason we call it there? What do we want to end up with there?
In that case we are calling right.ravel()
because right
is an array-like and we want to work with a flattened version of that array.
Comment From: jorisvandenbossche
And what do you mean with 'array-like' in this case? I assume we know what types it can be? (The code you posted above is a bit out of context, and the original links you have there don't point anymore to the good location)
Comment From: jbrockmendel
And what do you mean with 'array-like' in this case?
https://github.com/pandas-dev/pandas/blob/master/pandas/core/ops.py#L842
isinstance(y, (np.ndarray, ABCSeries, ABCIndexClass))
(presumably we'll let through PeriodArray etc before long)
# PeriodIndex.ravel() returns int64 dtype, so we have
# to work around that case. See GH#19956
yrav = y if is_period_dtype(y) else y.ravel()
mask = notna(xrav) & notna(yrav)
Comment From: jorisvandenbossche
But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)
Comment From: TomAugspurger
I think just checking if y.ndim > 1: yrav = y.ravel()
is the way to go. Not so elegant, but it avoids the discussion of what "ravel" means for EA-backed Series, and inconsistencies between ndarray and ea-backed series.
Comment From: jorisvandenbossche
Anyway, mainly what I am trying to ask about: in this specific case that motivated the proposal, there might be a very easy way to also avoid duplicated code, without needing ravel
?
We can simply check the dimension for example?
Comment From: jorisvandenbossche
Well, so exactly what @TomAugspurger said :-)
Comment From: jbrockmendel
But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)
The array can be 2D.
I think just checking if y.ndim > 1: yrav = y.ravel() is the way to go.
This is fine. The extant code also works fine.
but it avoids the discussion of what "ravel" means for EA-backed Series
That is pretty much what the Issue is about. We can certainly stick a pin in this, but long-term the method's behavior should match the standard usage of the term "ravel"
Comment From: TomAugspurger
Yeah, we'll certainly need to resolve this, but I think it depends a bit on the result of #19954, if you have any thoughts there.
On Fri, Oct 26, 2018 at 9:37 AM jbrockmendel notifications@github.com wrote:
But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)
The array can be 2D.
I think just checking if y.ndim > 1: yrav = y.ravel() is the way to go.
This is fine. The extant code also works fine.
but it avoids the discussion of what "ravel" means for EA-backed Series
That is pretty much what the Issue is about. We can certainly stick a pin in this, but long-term the method's behavior should match the standard usage of the term "ravel"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/19956#issuecomment-433429878, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIkZ77ozJxSuwAAPXD2QtdZz1jQHeks5uox4ngaJpZM4SY1EH .
Comment From: jbrockmendel
This deprecation was done in 1.2; closing.
Comment From: jorisvandenbossche
But did we ever follow-up on the discussion in https://github.com/pandas-dev/pandas/pull/36900? The deprecation warning still says that in the future it will return a view on self. Which still creates an inconsistency with Series.
Comment From: jbrockmendel
I see the appeal of being consistent with Series, but dont like the idea of materializing RangeIndex/MultiIndex.