https://pandas.pydata.org/speed/pandas/index.html#stat_ops.FrameMultiIndexOps.time_op?p-level=1&p-op=%27std%27&commits=c9144ca54dcc924995acae3d9dcb890a5802d7c0
import pandas as pd
import numpy as np
levels = [np.arange(10), np.arange(100), np.arange(100)]
codes = [
np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10),
]
index = pd.MultiIndex(levels=levels, codes=codes)
df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
%timeit df.groupby(level=1).std()
Points to https://github.com/pandas-dev/pandas/pull/34372 (cc @rhshadrach), but there was an earlier slowdown.
Comment From: TomAugspurger
Added code to reproduce @rhshadrach are you able to look into why this is slower now?
# 1.0.4
%timeit df.std(level=1)
## -- End pasted text --
3.68 ms ± 51.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# master
%timeit df.std(level=1)
## -- End pasted text --
5.47 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Comment From: rhshadrach
Thanks - I'll be able to look into this on Sunday.
Comment From: rhshadrach
Confirmed the timings on 1.0.4, master, as well as PR #34372 directly. This agrees with the slowdown that was reported in https://github.com/pandas-dev/pandas/pull/34372#issue-422910411. I also confirmed that single column performance on master beats 1.0.4 by a similar ratio as reported in the comment linked above:
df = pd.DataFrame(np.random.randn(len(index), 1), index=index)
# 1.0.4
%timeit df.std(level=1)
1.93 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# master
%timeit df.std(level=1)
1.6 ms ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
This makes this a duplicate of #34771, although previously that issue only mentioned var -- I've updated it to include std.
Comment From: CloseChoice
I profiled the call with both versions. Here is my script in case s.b. wants to reproduce:
import cProfile, pstats, io
from pstats import SortKey
import timeit
import pandas as pd
import numpy as np
levels = [np.arange(10), np.arange(100), np.arange(100)]
codes = [
np.arange(10).repeat(10000),
np.tile(np.arange(100).repeat(100), 10),
np.tile(np.tile(np.arange(100), 100), 10),
]
index = pd.MultiIndex(levels=levels, codes=codes)
df = pd.DataFrame(np.random.randn(len(index), 4), index=index)
pr = cProfile.Profile(timeunit=0.000000001)
pr.enable()
df.std(level=1)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())
Here are the results (I changed the paths at some lines).
v1.0.4 (only first 50 lines)
4214 function calls (3938 primitive calls) in 0.005 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.005 0.005 pandas/core/generic.py:11222(stat_func)
1 0.000 0.000 0.005 0.005 pandas/core/generic.py:10094(_agg_by_level)
1 0.000 0.000 0.003 0.003 pandas/core/groupby/groupby.py:1248(std)
1 0.000 0.000 0.003 0.003 pandas/core/groupby/groupby.py:1271(var)
1 0.000 0.000 0.003 0.003 pandas/core/groupby/generic.py:990(_cython_agg_general)
1 0.000 0.000 0.003 0.003 pandas/core/groupby/generic.py:998(_cython_agg_blocks)
1 0.000 0.000 0.002 0.002 pandas/core/groupby/ops.py:583(aggregate)
1 0.000 0.000 0.002 0.002 pandas/core/groupby/ops.py:427(_cython_operation)
1 0.000 0.000 0.002 0.002 pandas/core/frame.py:5736(groupby)
1 0.000 0.000 0.002 0.002 pandas/core/groupby/groupby.py:362(__init__)
1 0.000 0.000 0.002 0.002 pandas/core/groupby/grouper.py:426(get_grouper)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/ops.py:288(ngroups)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/ops.py:298(result_index)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/grouper.py:396(result_index)
2 0.000 0.000 0.001 0.001 pandas/core/groupby/grouper.py:402(group_index)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/grouper.py:409(_make_codes)
1 0.000 0.000 0.001 0.001 pandas/core/algorithms.py:585(factorize)
1 0.001 0.001 0.001 0.001 pandas/core/groupby/ops.py:593(_aggregate)
2 0.000 0.000 0.001 0.000 pandas/core/indexes/range.py:345(get_loc)
2 0.000 0.000 0.001 0.000 pandas/core/indexes/base.py:2637(get_loc)
1 0.000 0.000 0.001 0.001 pandas/core/indexes/multi.py:1561(get_level_values)
1 0.000 0.000 0.001 0.001 pandas/core/indexes/multi.py:1535(_get_level_values)
3 0.000 0.000 0.001 0.000 {method 'get_loc' of 'pandas._libs.index.IndexEngine' objects}
1 0.000 0.000 0.001 0.001 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/algorithms.py:456(_factorize_array)
1 0.000 0.000 0.001 0.001 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/groupby/grouper.py:563(is_in_axis)
1 0.000 0.000 0.001 0.001 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/indexes/base.py:888(__repr__)
1 0.001 0.001 0.001 0.001 {method 'factorize' of 'pandas._libs.hashtable.Int64HashTable' objects}
1 0.000 0.000 0.001 0.001 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/indexes/base.py:924(_format_data)
1 0.000 0.000 0.001 0.001 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/io/formats/printing.py:280(format_object_summary)
2 0.000 0.000 0.001 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/algorithms.py:1565(take_nd)
5 0.000 0.000 0.000 0.000 {pandas._libs.algos.ensure_int64}
7 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/indexes/base.py:276(__new__)
7 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/indexes/numeric.py:107(_shallow_copy)
6 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/indexes/base.py:544(_shallow_copy_with_infer)
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/internals/managers.py:708(get_numeric_data)
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/internals/managers.py:718(combine)
17 0.000 0.000 0.000 0.000 /home/tobias/anaconda3/envs/pandas_1_0_3/lib/python3.8/abc.py:96(__instancecheck__)
17 0.000 0.000 0.000 0.000 {built-in method _abc._abc_instancecheck}
2 0.000 0.000 0.000 0.000 {pandas._libs.algos.take_1d_int64_int64}
9 0.000 0.000 0.000 0.000 {pandas._libs.lib.is_list_like}
95/9 0.000 0.000 0.000 0.000 /home/tobias/anaconda3/envs/pandas_1_0_3/lib/python3.8/abc.py:100(__subclasscheck__)
95/9 0.000 0.000 0.000 0.000 {built-in method _abc._abc_subclasscheck}
20 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/io/formats/printing.py:275(<lambda>)
20 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/io/formats/printing.py:162(pprint_thing)
4 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/internals/managers.py:122(__init__)
2 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas_1_0_3/pandas/core/algorithms.py:64(_ensure_data)
and master:
3048 function calls (2862 primitive calls) in 0.006 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.006 0.006 pandas/core/generic.py:11413(stat_func)
1 0.000 0.000 0.006 0.006 pandas/core/generic.py:10196(_agg_by_level)
1 0.000 0.000 0.005 0.005 pandas/core/groupby/groupby.py:1419(std)
1 0.003 0.003 0.005 0.005 pandas/core/groupby/groupby.py:2366(_get_cythonized_result)
1 0.000 0.000 0.001 0.001 pandas/core/frame.py:6342(groupby)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/groupby.py:480(__init__)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/grouper.py:601(get_grouper)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/generic.py:1717(_wrap_aggregated_output)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/ops.py:268(group_info)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/ops.py:285(_get_compressed_codes)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/ops.py:229(codes)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/ops.py:231(<listcomp>)
2 0.000 0.000 0.001 0.000 pandas/core/groupby/grouper.py:563(codes)
1 0.000 0.000 0.001 0.001 pandas/core/groupby/grouper.py:582(_make_codes)
1 0.000 0.000 0.001 0.001 pandas/core/algorithms.py:503(factorize)
1 0.000 0.000 0.001 0.001 pandas/core/indexes/multi.py:1536(get_level_values)
1 0.000 0.000 0.001 0.001 pandas/core/indexes/multi.py:1511(_get_level_values)
1 0.000 0.000 0.001 0.001 pandas/core/algorithms.py:463(_factorize_array)
1 0.000 0.000 0.001 0.001 pandas/core/algorithms.py:1586(take_nd)
1 0.000 0.000 0.001 0.001 pandas/core/frame.py:436(__init__)
1 0.001 0.001 0.001 0.001 {method 'factorize' of 'pandas._libs.hashtable.Int64HashTable' objects}
1 0.000 0.000 0.001 0.001 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/internals/construction.py:237(init_dict)
7/3 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/indexes/base.py:280(__new__)
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/internals/construction.py:60(arrays_to_mgr)
13 0.000 0.000 0.000 0.000 {pandas._libs.lib.is_list_like}
21 0.000 0.000 0.000 0.000 /home/tobias/anaconda3/envs/pandas-dev/lib/python3.8/abc.py:96(__instancecheck__)
5 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/groupby/generic.py:995(_iterate_slices)
1 0.000 0.000 0.000 0.000 {pandas._libs.algos.take_1d_int64_int64}
21 0.000 0.000 0.000 0.000 {built-in method _abc._abc_instancecheck}
5 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/frame.py:930(items)
82/6 0.000 0.000 0.000 0.000 /home/tobias/anaconda3/envs/pandas-dev/lib/python3.8/abc.py:100(__subclasscheck__)
82/6 0.000 0.000 0.000 0.000 {built-in method _abc._abc_subclasscheck}
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/internals/managers.py:1638(create_block_manager_from_arrays)
4 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/generic.py:3523(_get_item_cache)
608 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/internals/managers.py:1678(form_blocks)
4 0.000 0.000 0.000 0.000 {pandas._libs.algos.ensure_int64}
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/groupby/grouper.py:752(is_in_obj)
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/frame.py:2827(__getitem__)
49 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/dtypes/base.py:256(is_dtype)
4 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/internals/managers.py:951(iget)
5 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/internals/blocks.py:2652(get_block_type)
1 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/internals/construction.py:329(_homogenize)
14/6 0.000 0.000 0.000 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
4 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/construction.py:389(sanitize_array)
2 0.000 0.000 0.000 0.000 /home/tobias/programming/python/pandas_tests/pandas/pandas/core/indexes/base.py:5705(_maybe_cast_data_without_dtype)
Seems like we spend a lot more time in get_cythonized_result
Comment From: simonjayhawkins
moved off 1.1.1 milestone (scheduled for this week) as no PRs to fix in the pipeline
Comment From: simonjayhawkins
moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline
Comment From: simonjayhawkins
moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline
Comment From: simonjayhawkins
moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline
Comment From: jbrockmendel
In [3]: %timeit df.groupby(level=1).std()
2.06 ms ± 65.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <- main
2.25 ms ± 56.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <- 1.5.3
2.3 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <- 1.4.0
That's as far back as I can install easily
Comment From: rhshadrach
Thanks @jbrockmendel - closing.