Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this issue exists on the latest version of pandas.
-
[X] I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
>>> import pandas as pd
>>> cat = pd.Categorical(list("bac")*1_000_000)
>>> df = pd.DataFrame({"cat": cat, "codes": cat.codes})
>>> %timeit df.groupby("cat").sum()
120 ms ± 246 µs per loop. # v1.5.2
404 ms ± 2.24 ms per loop. # main
This is likely the same underlying perf bug as for #51680.
This is probably a blocker for v2.0.
Installed Versions
Prior Performance
No response
Comment From: jbrockmendel
I get within the margin of error of 22ms on both 1.5.3 and main
Comment From: rhshadrach
I'm not seeing anything jump out in the categorical groupby ASVs either:
https://asv-runner.github.io/asv-collection/pandas/
Comment From: rhshadrach
Could this be an issue that only shows up on Mac?
Comment From: jbrockmendel
im on mac (pre-M1)
Comment From: phofl
No, I have an m2 and not seeing it either
Comment From: phofl
I guess something on @topper-123 machine is off? Not sure but mine is more than 10 times faster, this seems a bit odd, getting around 10ms for both
Comment From: topper-123
Yeah, I had a build difference...
One was with python setup.py build_ext --inplace --force -j4
while the slow version was python setup.py build_ext --inplace --force -j4 --with-debugging-symbols
. I can close this, sorry.