Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame(["a", "b", "c"], columns=["test"])
print(df["test"].value_counts())
Problem description
Using value_counts in a testsuite can be a problem, when the resulting values have the same count as they permutade on each call, e.g.:
$ python pandas_value_counts.py
a 1
b 1
c 1
Name: test, dtype: int64
$ python pandas_value_counts.py
c 1
a 1
b 1
Name: test, dtype: int64
Expected Output
Some stable/deterministic output or optionally additionally sorting of the keys, if they have the same counts
Output of pd.show_versions()
Comment From: jreback
see discussion in these related issues.
xref #12679 xref #11227 xref #14860
This is not guaranteed in any way, nor is performant to do so. Further why should this be anything but an arbitrary ordering? This is a mapping of value -> count.
Comment From: jreback
If you need this for testing, the easiest / best is simply to .sort_index()
and compare.
Comment From: jreback
actually this is a duplicate of #12679
the guarantee on sort=False
(not the default) is not there. This could be done as a post-processing step
The .unique()
has a guarantee that shows the orderings as seen.
In [23]: s.value_counts(sort=False).reindex(s.unique())
Out[23]:
a 1
b 1
c 1
dtype: int64
In [24]: s = Series(list('bac'))
In [25]: s.value_counts(sort=False).reindex(s.unique())
Out[25]:
b 1
a 1
c 1
dtype: int64
If you'd like to do a PR for #12679 would be great.
Comment From: tomspur
Yes, I would need this for testing and would like to have the highest count of the data. With just getting the first item of the value_counts
, this is some random value, that has the maximum count and it would be great to have always the same value.
Comment From: jreback
love to have a PR as above!