Pandas ExtensionArray.map - Nineya|java/go/python

Both Categorical and SparseArray found implementing a .map method useful. This allows them to efficiently apply a function / mapping to the categories / sp_values, rather than every element of an array. We dispatch to it internally in https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L3379-L3380

So, we need to either

Add it to the interface
hard-code checks for categorical or sparse dtype there.

Do people have a preference? Right now I'm leaning toward 2. Or are there other array types that would have a similar efficiency gain to Categorical or Sparse?

Comment From: jreback

-1 on hard coding things

expanding the interface is the way forward here

Comment From: TomAugspurger

-1 on expanding things needlessly though. I’d rather wait for a compelling use case to come along.

From: Jeff Reback notifications@github.com Sent: Tuesday, October 16, 2018 7:13:42 AM To: pandas-dev/pandas Cc: Tom Augspurger; Author Subject: Re: [pandas-dev/pandas] ExtensionArray.map (#23179)

-1 on hard coding things

expanding the interface is the way forward here

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/23179#issuecomment-430213727, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIlJ6xk5ybMZul4tdigJ8bd5o1sP4ks5ulc12gaJpZM4XeIt8.

Comment From: jbrockmendel

Trying to de-duplicate is_extension_array_dtype and is_extension_type, I'm finding that the lack of EA.map is a blocker for using is_extension_array_dtype in all cases.

Comment From: TomAugspurger

FWIW, I don't think deduplicating is_extension_array_dtype and is_extension_dtype important enough to warrant adding a new method to the API.

On Wed, Nov 6, 2019 at 7:31 PM jbrockmendel notifications@github.com wrote:

Trying to de-duplicate is_extension_array_dtype and is_extension_type, I'm finding that the lack of EA.map is a blocker for using is_extension_array_dtype in all cases.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/23179?email_source=notifications&email_token=AAKAOITWXM6223AUIINOSHLQSNOX3A5CNFSM4F3YRN6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIPG4Q#issuecomment-550564722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIUF4HT534VRBJWHHMTQSNOX3ANCNFSM4F3YRN6A .

Comment From: rhshadrach

Ran into this in #39941, where map is used for categorical and sparse in apply. Here, it results in different dtype behavior than other EAs. But it seems to me that map only makes sense when any UDF can be remain in the same dtype (which I think is true for categorical and sparse). But how would one implement map for e.g. Int64 where the mapper is lambda x: 3.2 or lambda x: "a"?

Edit: I just found datetime64 also implements map which does not have the property I mentioned.

Comment From: jbrockmendel

any UDF can be remain in the same dtype (which I think is true for categorical and sparse)

Not exactly. For sparse you can make the return dtype always be sparse, but we can come up with UDFs that must have different sparse dtype. For Categorical you could pass your result to type(self)._from_sequence(result, dtype=self.dtype) and that will usually work, but thats bc it will just set any non-fitting element to nan.

Comment From: topper-123

Closed. ExtensionArray.map was added in #51809.