I've noticed performance differences between column and row wise function applications under windows 8.1. The problem most likely roots back the memory storage of the numpy arrays, which are per default 'C' like, hence causing for all column based pandas operations signifcant performance differences vs row wise operations.Under linux the problem is not present (or beeing taken care of explicity). If this is hard to fix in windows, I would argue to change the default behaviour to 'Fortran' like memory storaging (column by column), as arguably most time series data is rather of type many rows vs few columns.
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
num_sim = 1000
f = np.array(np.random.standard_normal((num_sim,num_sim)), order='F')
c = np.array(np.random.standard_normal((num_sim,num_sim)), order='C')
df_f = pd.DataFrame(f)
df_c = pd.DataFrame(c)
Expected Output
I expect similiar performance of colum or row wise operations.
%timeit df_f.mean()
100 loops, best of 3: 1.9 ms per loop
%timeit df_f.mean(axis=1)
100 loops, best of 3: 10.3 ms per loop
%timeit df_c.mean()
100 loops, best of 3: 10.2 ms per loop
%timeit df_c.mean(axis=1)
100 loops, best of 3: 1.86 ms per loop
output of pd.show_versions()
commit: None python: 3.4.4.final.0 python-bits: 64 OS: Windows OS-release: 8.1 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.18.1 nose: None pip: 8.1.0 setuptools: 20.2.2 Cython: 0.24 numpy: 1.10.4 scipy: 0.17.0 statsmodels: 0.6.1 xarray: None IPython: 4.0.3 sphinx: 1.3.5 patsy: 0.4.1 dateutil: 2.5.0 pytz: 2016.1 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.4.6 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.5.0 bs4: 4.4.1 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: 1.0.11 pymysql: 0.6.2.None psycopg2: None jinja2: 2.8 boto: 2.39.0 pandas_datareader: 0.2.0
Comment From: jreback
this has nothing to do with windows, rather how caching works. Data is stored in columnar format; when However the cache tries to load is linearly, which defeats the purpose.
But, and here's the issue, your data is probably already in this format to begin with if you are constructing it with ndarrays.
Bottom line is that pandas follows what numpy does, this might be changed in the future, but is completely non-trivial.
Finally, not sure why these trivial differences even make a difference in reality. You are likely bound by other parts of calculations.
Comment From: MMCMA
The issue was that I tested on 3 different win and 3 different linux envirs all on different hardware. by chance all three win suffered "the problem". while the linux were ok. Finally, I just checked win/linux on the same hardware and I see the same behavior, sorry for the confusion...
Comment From: jreback
great @MMCMA yeah that would be very weird JUST on windows. caching is very important but also very tricky.