we have quite a lot of code to support .mode (internally we use pretty much the same algos as .value_counts()
)
but unless I am not understanding isn't.
x.mode() == x.value_counts(sort=True).index[0]
?
if ties, then return a slice of the result
e.g.
In [17]: Series([1, 1, 1, 2, 2, 2, 3]).value_counts()
Out[17]:
2 3
1 3
3 1
dtype: int64
In [18]: Series([1, 1, 1, 2, 2, 2, 3]).mode()
Out[18]:
0 1
1 2
dtype: int64
cc @TomAugspurger
Comment From: jreback
cc @buyology
Comment From: TomAugspurger
Yeah, this should work I think. The only difference I see is the sort-order on ties:
In [22]: data = [1, 2, 1, 2]
In [23]: mode(data)
Out[23]:
0 1
1 2
dtype: int64
In [24]: pd.value_counts(data) # Notice the 2 is first.
Out[24]:
2 2
1 2
dtype: int64
but can probably just reverse it.
Comment From: jreback
yeah the sorting is odd. ok, this is worth it to do just to remove some duplicated code in hashtable that supports mode.
Comment From: buyology
sounds very reasonable. I guess this can subsume #15714 then
Comment From: jreback
@buyology these are a bit orthogonal, and I think we can merge your PR that covers #15714 (I had left some comments). That is more of a user-facing API issue, whereas this is an implementation change that will not be visible.
So they can be done in either order (this one is a bit more involved in any event).
You are welcome to tackle this as well!
Comment From: jbrockmendel
mode is defined in hashtable_func_helper, so presumably deriving it from value_counts will entail a perf hit. no strong opinion.
Comment From: jbrockmendel
mode in hashtable_func_helper.pxi.in is defined in terms of value_counts. Closing as complete.