Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
from datetime import datetime,timedelta

threshold = np.datetime64(datetime.today()+timedelta(weeks=3))
df[threshold < df['date']]
df[df['date'] < threshold]

Problem description

As the two comparisons above show, they should present opposite results. Instead, both of them return the same result, as if df['date'] was always the first comparison operand.

Expected Output

The picture below illustrates the issue. It was expected that the line df[threshold < df['date']] would result in an empty DataFrame. screenshot from 2017-07-05 15-17-58

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-83-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 34.4.1 Cython: None numpy: 1.12.1 scipy: None statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 2.0.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.5.3 html5lib: None httplib2: 0.10.3 apiclient: 1.6.2 sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: 2.38.0 pandas_datareader: None

Comment From: cristianornelas

Same issue here.

Comment From: TomAugspurger

Can you make a complete example? Your df is undefined.

Comment From: gmatheus95

I cannot give you the actual dataframe I'm working with since it's private data, but here's a very naive executable code to illustrate the issue.

import pandas as pd
import numpy as np
from datetime import datetime,timedelta

today = datetime.today()
x = [today] * 10000
df = pd.DataFrame({'date':x})

threshold = np.datetime64(datetime.today()+timedelta(weeks=3))

#and then the comparisons:
df[threshold < df['date']]
df[df['date'] < threshold]

Thanks.

Comment From: TomAugspurger

Seems to be related to the numpy timestamp being microsecond precision:

In [111]: np.datetime64(today + timedelta(weeks=3)) < pd.Series([today])
Out[111]:
0    True
dtype: bool

In [112]: np.datetime64(today + timedelta(weeks=3)).astype("<M8[ns]") < pd.Series([today])
Out[112]:
0    False
dtype: bool

I'm not sure what the desired outcome is here. pandas only deals with nanosecond precision timestamps, so do we silently change the precision of the input, or raise an error?

Either way, we need to fix things to be consistent between Series and DataFrame here.

Comment From: gmatheus95

I'm sorry, I don't get it. Even after adding three weeks of delta, why do I still get True in Out[111] because of timestamp precision? I'm sorry if it's a naive question, I'm not really experienced in numpy.

Thank you anyway for addressing the issue so fast!

Comment From: TomAugspurger

I'm sorry, I don't get it. Even after adding three weeks of delta, why do I still get True in Out[111] because of timestamp precision?

Sorry if I wasn't clear, it's definitely a bug. It should be False (or maybe an exception).

Numpy stores datetimes as int64s, where the exact datetime of an integer depends on the resolution.

In [119]: np.datetime64(today).view('i8')
Out[119]: 1499262896667864

In [120]: np.datetime64(today).astype('<M8[ns]').view('i8')
Out[120]: 1499262896667864000

It's possible (haven't confirmed yet) that when you do threshold < df['date'], pandas looks at those integers without checking that they're at the same resolution. And since your threshold is in microseconds it's going to be smaller than the pandas one (which is in nanoseconds). This is guess a guess though.

Comment From: TomAugspurger

Ah, indeed this seems to be a duplicate of https://github.com/pandas-dev/pandas/issues/7996.

For now, you can workaround by converting threshold to a pd.Timestamp, which will ensure that you have nanosecond-precision datetimes everywhere.

Comment From: gmatheus95

Oh now I get it, thanks again!!