Pandas pandas sort_values significantly slower on Python 3.5.2 vs. Python 2.7.12

Code Sample, a copy-pastable example if possible

import pandas as pd import numpy as np from time import time import sys

df_data = pd.DataFrame(np.random.randint(0,int(1e6),int(20e6)), columns=['pop_id']) df_data['PL_dB'] = 50 + np.random.random(df_data.shape[0]) * 100 df_data['Rx_dBm'] = 23 - df_data.PL_dB df_data['noise_mW'] = (10.**(df_data.Rx_dBm / 10.)).astype('float32')

start = time() df_data.sort_values(by=['pop_id', 'Rx_dBm'], ascending=[True, False], inplace=True) df_data.reset_index(drop=True, inplace=True)

print("Sort took {:0.2f} seconds".format(time() - start)) print('Python version ' + sys.version) print('pandas version ' + pd.version)

output of `pd.show_versions()`

For Python 2.7

INSTALLED VERSIONS

commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None

pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 25.1.6 Cython: 0.24.1 numpy: 1.11.1 scipy: 0.18.0 statsmodels: 0.6.1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.4 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.40.0 pandas_datareader: None

For Python 3.5

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None

pandas: 0.18.1 nose: None pip: 8.1.2 setuptools: 25.1.6 Cython: 0.24.1 numpy: 1.11.1 scipy: 0.18.0 statsmodels: None xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.4.1 patsy: None dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 1.5.1 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None

Results with Python 2.7

Sort took 40.91 seconds Python version 2.7.12 |Anaconda custom (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)] pandas version 0.18.1

Results with Python 3.5

Sort took 81.30 seconds Python version 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] pandas version 0.18.1

Comment From: jreback

looks the same as issue fixed by https://github.com/pydata/pandas/pull/13436 if someone could confirm

Comment From: jreback

note that using inplace is pretty non-idiomatic as it promotes less readable and more error prone code

2.7

In [2]: import pandas as pd
   ...: import numpy as np
   ...: from time import time
   ...: import sys
   ...: 
   ...: df_data = pd.DataFrame(np.random.randint(0,int(1e6),int(20e5)), columns=['pop_id'])
   ...: df_data['PL_dB'] = 50 + np.random.random(df_data.shape[0]) * 100
   ...: df_data['Rx_dBm'] = 23 - df_data.PL_dB
   ...: df_data['noise_mW'] = (10.**(df_data.Rx_dBm / 10.)).astype('float32')

In [3]: %timeit df_data.sort_values(by=['pop_id', 'Rx_dBm'], ascending=[True, False])
1 loop, best of 3: 1.86 s per loop

In [4]: pd.__version__
Out[4]: '0.18.1+403.ga0151a7'

In [5]: sys.version
Out[5]: '2.7.11 |Continuum Analytics, Inc.| (default, Dec  6 2015, 18:57:58) \n[GCC 4.2.1 (Apple Inc. build 5577)]'

3.5

In [2]: %timeit df_data.sort_values(by=['pop_id', 'Rx_dBm'], ascending=[True, False])
1 loop, best of 3: 1.76 s per loop

In [3]:  pd.__version__
   ...: 
Out[3]: '0.18.1+403.ga0151a7'

In [4]:  sys.version
   ...: 
Out[4]: '3.5.1 |Continuum Analytics, Inc.| (default, Dec  7 2015, 11:24:55) \n[GCC 4.2.1 (Apple Inc. build 5577)]'

Pandas pandas sort_values significantly slower on Python 3.5.2 vs. Python 2.7.12

Code Sample, a copy-pastable example if possible

output of pd.show_versions()

For Python 2.7

INSTALLED VERSIONS

For Python 3.5

INSTALLED VERSIONS

Results with Python 2.7

Results with Python 3.5

output of `pd.show_versions()`