Is this required? Right now https://github.com/pandas-dev/pandas/blob/a03d9535b16a6d5441334ef2e698d72778cf8115/pandas/core/algorithms.py#L708-L711 assumes that it's implemented, but we don't document / implement it by default, so we end up with an AttributeError at runtime.
Comment From: jorisvandenbossche
Personally I would remove it from the EA interface. value_counts
is a very useful method, but IMO does not really fit on an array (returns a Series, while I think the arrays should be somewhat independent from pandas Series concepts), and is also rather easily to do based on _factorize
/_values_for_factorize
for EAs as a default implementation?
Comment From: TomAugspurger
Yes, the main sticking point though is the index. Once we have official support for extension indexes, we should be able to update algorithms.value_counts
to do the right thing.
Comment From: jreback
nog against removing value_counts
for the time being either until we have ExtensionIndex
, which really comes up a lot when you groupby things :.
Comment From: jbrockmendel
agreed on removing from EAs. It introduces a nasty dependency of the EA code on index/series code.
Comment From: jbrockmendel
and is also rather easily to do based on _factorize/_values_for_factorize for EAs as a default implementation?
I tried this the other day and found I had to do a lot of special-casing for Categorical and SparseArray. Has anyone else tried this?
Comment From: jbrockmendel
AFAICT, based on a) trying to implement this and b) previous discussions which I've lost track of:
1) It has been suggested that this should be doable using factorize
, but I have been unable to get this to work.
2) AFAICT SparseArray and possibly Categorical are going to have to override whatever general-case implementation we come up with.
- This means that we will need something on the EA to allow subclasses to override the default behavior
3) The EA should not implement value_counts -> Series
, but instead _value_counts->Tuple[EA, ndarray[int]]
(or Tuple[EA, IntegerArray]
)
Comment From: jorisvandenbossche
Do you remember mote details on the issues you ran into for Categorical?
Comment From: jbrockmendel
Not off the top of my head, no
Comment From: samgd
I'm implementing an ExtensionArray interface and am hitting a failing test due to the lack of value_counts
. What is the status on this? Required? Can I mark the test as skip?
Comment From: jbrockmendel
AFAICT value_counts is de-facto required but not documented as such. As above I think it'd be great if we could implement it in terms of lower-level methods, but until then I think it needs to be documented as something authors are expected to implement.