Pandas Is it normal that iloc is very slow when iterating?

I'm honestly not sure if that is to be expected or if it's a general issue. It is a problem when doing row comparisons that can't be done with existing row-wise methods like diff.

Consider this example showing three functions doing row-wise differences on a randomly generated dataframe:

import pandas as pd
import random as rd

ITER = 1000
rd.seed()
D = {'A':[]}
for i in range(ITER):
  D['A'].append(rd.randint(0,100))
P = pd.DataFrame.from_dict(D)

def Test1(P):
  K = zip(P.A)
  S = [0]
  for i in range(ITER-1):
    S.append(K[i+1][0] - K[i][0])
  return pd.merge(P,pd.DataFrame(S),left_index=True,right_index=True)

def Test2(P):
  S = [0]
  for i in range(ITER-1):
    S.append(P.iloc[i+1][0] - P.iloc[i][0])
  return pd.merge(P,pd.DataFrame(S),left_index=True,right_index=True)

def Test3(P):
  return pd.merge(P,P.A.diff().to_frame(),left_index=True,right_index=True)

Now of course Test3 is the way to do it correctly in this special case, but this is just meant as an example.

Here's the output of doing a timeit on all three methods:

%timeit(Test1(P))
1000 loops, best of 3: 2 ms per loop

%timeit(Test2(P))
1 loop, best of 3: 315 ms per loop

%timeit(Test3(P))
1000 loops, best of 3: 1.3 ms per loop

And it shows how extremely slow iterating over iloc is compared to iterating over a list that has been created using zip on the dataframe column.

output of `pd.show_versions()`

python: 2.7.12.final.0 pandas: 0.18.1

Comment From: jreback

You are using it in about the most inefficient way possible. You are doing multiple operations and creating intermediate Series in the middle. Generally .iloc accepts quite a variety of input for flexibility. Row iterating is never recommended.

In [32]: P.iloc[0]
Out[32]: 
A    73
Name: 0, dtype: int64

In [33]: P.iloc[0][0]
Out[33]: 73

Pandas Is it normal that iloc is very slow when iterating?

Consider this example showing three functions doing row-wise differences on a randomly generated dataframe:

output of pd.show_versions()

output of `pd.show_versions()`