Came a across this very odd problem while trying to get difference between two columns in a DataFrame of type datetime64. The following simple example illustrates the problem:

df = pd.DataFrame({'A': pd.date_range('20150301', '20150304'), 
               'B': pd.date_range('20150303', '20150306')}, 
               index = [1, 2, 3, 3])
df
    A           B
1   2015-03-01  2015-03-03
2   2015-03-02  2015-03-04
3   2015-03-03  2015-03-05
3   2015-03-04  2015-03-06

df.B - df.A
1   2 days
2   2 days
3   2 days
3   1 days
3   3 days
3   2 days

dtype: timedelta64[ns]

What happens generally is that the output contains an entry for the difference in every pair in the Cartesian product of the rows containing the duplicate index. I learned this the hard way on a df with 30K+ rows and only 50 distinct indices ending up trying to compute a series with 385 million+ entries, quickly freezing my computer.

On a related note, using the eval method to compute the difference throws the following error:

df.eval('B - A')
ValueError: unkown type timedelta64[ns]

Comment From: jorisvandenbossche

Which pandas version are you using? (pd.__version__)? I get the correct result on 0.15.2 and with master.

Comment From: eoincondron

Ah yes, I forgot I was working with version 0.14.1. Should have tested on 0.15 first but forgot all about it. Thanks.

Comment From: jorisvandenbossche

@eoincondron Is there still a problem? (as you reopened the issue)

Comment From: eoincondron

No that was a mistake, sorry.