Both Categorical and SparseArray found implementing a .map
method useful. This allows them to efficiently apply a function / mapping to the categories / sp_values, rather than every element of an array. We dispatch to it internally in https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L3379-L3380
So, we need to either
- Add it to the interface
- hard-code checks for categorical or sparse dtype there.
Do people have a preference? Right now I'm leaning toward 2. Or are there other array types that would have a similar efficiency gain to Categorical or Sparse?
Comment From: jreback
-1 on hard coding things
expanding the interface is the way forward here
Comment From: TomAugspurger
-1 on expanding things needlessly though. I’d rather wait for a compelling use case to come along.
From: Jeff Reback notifications@github.com Sent: Tuesday, October 16, 2018 7:13:42 AM To: pandas-dev/pandas Cc: Tom Augspurger; Author Subject: Re: [pandas-dev/pandas] ExtensionArray.map (#23179)
-1 on hard coding things
expanding the interface is the way forward here
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/23179#issuecomment-430213727, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIlJ6xk5ybMZul4tdigJ8bd5o1sP4ks5ulc12gaJpZM4XeIt8.
Comment From: jbrockmendel
Trying to de-duplicate is_extension_array_dtype and is_extension_type, I'm finding that the lack of EA.map is a blocker for using is_extension_array_dtype in all cases.
Comment From: TomAugspurger
FWIW, I don't think deduplicating is_extension_array_dtype and is_extension_dtype important enough to warrant adding a new method to the API.
On Wed, Nov 6, 2019 at 7:31 PM jbrockmendel notifications@github.com wrote:
Trying to de-duplicate is_extension_array_dtype and is_extension_type, I'm finding that the lack of EA.map is a blocker for using is_extension_array_dtype in all cases.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/23179?email_source=notifications&email_token=AAKAOITWXM6223AUIINOSHLQSNOX3A5CNFSM4F3YRN6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIPG4Q#issuecomment-550564722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIUF4HT534VRBJWHHMTQSNOX3ANCNFSM4F3YRN6A .
Comment From: rhshadrach
Ran into this in #39941, where map
is used for categorical and sparse in apply
. Here, it results in different dtype behavior than other EAs. But it seems to me that map
only makes sense when any UDF can be remain in the same dtype (which I think is true for categorical and sparse). But how would one implement map for e.g. Int64
where the mapper is lambda x: 3.2
or lambda x: "a"
?
Edit: I just found datetime64 also implements map which does not have the property I mentioned.
Comment From: jbrockmendel
any UDF can be remain in the same dtype (which I think is true for categorical and sparse)
Not exactly. For sparse you can make the return dtype always be sparse, but we can come up with UDFs that must have different sparse dtype. For Categorical you could pass your result to type(self)._from_sequence(result, dtype=self.dtype)
and that will usually work, but thats bc it will just set any non-fitting element to nan.
Comment From: topper-123
Closed. ExtensionArray.map
was added in #51809.