Pandas PeriodIndex.ravel returns ndarray[int64]

This came up while trying to de-duplicate some DataFrame vs Series arithmetic logic in core.ops.

Comment From: gfyoung

That seems consistent with the docstring?

Comment From: jbrockmendel

Sure, but it's not exactly useful behavior.

Comment From: gfyoung

Why not?

Comment From: jbrockmendel

The relevant context is the ongoing effort to ensure that Index/Series/DataFrame operations are consistent with one another. The Index vs Series part is most of the way there, and I'm just starting in on Series vs DataFrame.

In core.ops, Series and DataFrame arithmetic operations have really similar error catching logic:

Series:

        except TypeError:
            if isinstance(y, (np.ndarray, ABCSeries, pd.Index)):
                dtype = find_common_type([x.dtype, y.dtype])
                result = np.empty(x.size, dtype=dtype)
                mask = notna(x) & notna(y)
                result[mask] = op(x[mask], com._values_from_object(y[mask]))
            else:
                assert isinstance(x, np.ndarray)
                result = np.empty(len(x), dtype=x.dtype)
                mask = notna(x)
                result[mask] = op(x[mask], y)

DataFrame:

        except TypeError:
            xrav = x.ravel()
            if isinstance(y, (np.ndarray, ABCSeries)):
                dtype = np.find_common_type([x.dtype, y.dtype], [])
                result = np.empty(x.size, dtype=dtype)
                yrav = y.ravel()
                mask = notna(xrav) & notna(yrav)
                xrav = xrav[mask]

                if yrav.shape != mask.shape:
                    # FIXME: GH#5284, GH#5035, GH#19448
                    # Without specifically raising here we get mismatched
                    # errors in Py3 (TypeError) vs Py2 (ValueError)
                    raise ValueError('Cannot broadcast operands together.')

                yrav = yrav[mask]
                if xrav.size:
                    with np.errstate(all='ignore'):
                        result[mask] = op(xrav, yrav)

            elif isinstance(x, np.ndarray):
                # mask is only meaningful for x
                result = np.empty(x.size, dtype=x.dtype)
                mask = notna(xrav)
                xrav = xrav[mask]
                if xrav.size:
                    with np.errstate(all='ignore'):
                        result[mask] = op(xrav, y)

Very nearly the only differences here are a) find_common_type vs np.find_common_type and a ravel(). if it were the case that ravel() were reliably a no-op for the 1-D case in the Series operation, that would go a long way towards letting us de-duplicate some code.

Comment From: gfyoung

Ah, interesting...got it. Well, this is an API change, so I'll open it up to @jreback @jorisvandenbossche . Is there any reason why we always returned ndarray ?

Comment From: jbrockmendel

@TomAugspurger possibly change this behavior in a follow up to PeriodArray?

Comment From: TomAugspurger

Is the expected behavior an ndarray[object] with period object?

This could be done any time by defining PeriodIndex.ravel as self.values.ravel() instead of inheriting the base implementation.

Comment From: jbrockmendel

For PeriodArray I’d expect it to return a 1-dim-reshaped view on itself. So a no-op for EA since those are 1-dim to begin with.

Comment From: TomAugspurger

That'll need a larger discussion then, since the other EA-backed indexes all return ndarrays for ravel.

Comment From: jorisvandenbossche

Also for Series, ravel returns an ndarray.

I think there is also something to be said for ravel always to be an ndarray. Or to simply deprecate the method, since if you want a ravelled ndarray, you can also do the much more explicit np.asarray(val).ravel().

The question is: what do people use it for? It might be that it is used in cases where afterwards the code expects to now have a numpy array.

Comment From: jbrockmendel

The question is: what do people use it for?

The place that motivated this issue is in core.ops where we currently call ravel() on most things but specifically avoid calling it on PeriodIndex. It's a pretty ugly special case.

Comment From: jorisvandenbossche

And what's the reason we call it there? What do we want to end up with there?

Comment From: jbrockmendel

And what's the reason we call it there? What do we want to end up with there?

In that case we are calling right.ravel() because right is an array-like and we want to work with a flattened version of that array.

Comment From: jorisvandenbossche

And what do you mean with 'array-like' in this case? I assume we know what types it can be? (The code you posted above is a bit out of context, and the original links you have there don't point anymore to the good location)

Comment From: jbrockmendel

And what do you mean with 'array-like' in this case?

https://github.com/pandas-dev/pandas/blob/master/pandas/core/ops.py#L842

isinstance(y, (np.ndarray, ABCSeries, ABCIndexClass)) (presumably we'll let through PeriodArray etc before long)

        # PeriodIndex.ravel() returns int64 dtype, so we have
        # to work around that case.  See GH#19956
        yrav = y if is_period_dtype(y) else y.ravel()
        mask = notna(xrav) & notna(yrav)

Comment From: jorisvandenbossche

But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)

Comment From: TomAugspurger

I think just checking if y.ndim > 1: yrav = y.ravel() is the way to go. Not so elegant, but it avoids the discussion of what "ravel" means for EA-backed Series, and inconsistencies between ndarray and ea-backed series.

Comment From: jorisvandenbossche

Anyway, mainly what I am trying to ask about: in this specific case that motivated the proposal, there might be a very easy way to also avoid duplicated code, without needing ravel ? We can simply check the dimension for example?

Comment From: jorisvandenbossche

Well, so exactly what @TomAugspurger said :-)

Comment From: jbrockmendel

But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)

The array can be 2D.

I think just checking if y.ndim > 1: yrav = y.ravel() is the way to go.

This is fine. The extant code also works fine.

but it avoids the discussion of what "ravel" means for EA-backed Series

That is pretty much what the Issue is about. We can certainly stick a pin in this, but long-term the method's behavior should match the standard usage of the term "ravel"

Comment From: TomAugspurger

Yeah, we'll certainly need to resolve this, but I think it depends a bit on the result of #19954, if you have any thoughts there.

On Fri, Oct 26, 2018 at 9:37 AM jbrockmendel notifications@github.com wrote:

But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)

The array can be 2D.

I think just checking if y.ndim > 1: yrav = y.ravel() is the way to go.

This is fine. The extant code also works fine.

but it avoids the discussion of what "ravel" means for EA-backed Series

That is pretty much what the Issue is about. We can certainly stick a pin in this, but long-term the method's behavior should match the standard usage of the term "ravel"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/19956#issuecomment-433429878, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIkZ77ozJxSuwAAPXD2QtdZz1jQHeks5uox4ngaJpZM4SY1EH .

Comment From: jbrockmendel

This deprecation was done in 1.2; closing.

Comment From: jorisvandenbossche

But did we ever follow-up on the discussion in https://github.com/pandas-dev/pandas/pull/36900? The deprecation warning still says that in the future it will return a view on self. Which still creates an inconsistency with Series.

Comment From: jbrockmendel

I see the appeal of being consistent with Series, but dont like the idea of materializing RangeIndex/MultiIndex.