Please, consider the following code:
In [0]: import pandas as pd
In [1]: my_data = [{'TICKET_NUMBER' : '001', 'DATE' : '2015-11-11 11:15'},
{'TICKET_NUMBER' : '002', 'DATE' : '2015-11-12 14:32'},
{'TICKET_NUMBER' : '003', 'DATE' : '2015-11-13 15:33'},
{'TICKET_NUMBER' : '004', 'DATE' : '2015-11-14 11:34'},
{'TICKET_NUMBER' : '005', 'DATE' : '2015-11-15 17:35'},
{'TICKET_NUMBER' : '006', 'DATE' : '2015-11-16 19:36'},
{'TICKET_NUMBER' : '007', 'DATE' : '2015-11-17 12:37'},
{'TICKET_NUMBER' : '008', 'DATE' : '2015-11-18 15:38'}]
In [2]: df
Out [2]: DATE TICKET_NUMBER
0 2015-11-11 11:15 001
1 2015-11-12 14:32 002
2 2015-11-13 15:33 003
3 2015-11-14 11:34 004
4 2015-11-15 17:35 005
5 2015-11-11 11:15 006
6 2015-11-16 19:36 007
7 2015-11-18 15:38 008
In [3]: df = pd.DataFrame(my_data)
In [4]: df.DATE = pd.to_datetime(df.DATE)
In [5]: df[df.DATE<='2015-11-13']
Out [5]: DATE TICKET_NUMBER
0 2015-11-11 11:15 001
1 2015-11-12 14:32 002
The above code should print rows with dates "LESS THAN OR EQUAL TO" 2015-11-13 (indexes 0,1 AND 2), but it only shows rows with dates "LESS THAN" 2015-11-13 (that is, indexes 0 and 1, not 3).
Remarks:
(1) Filtering "MORE THAN OR EQUAL TO" some date (instead of LESS) works as expected.
(2) Removing the HOUR in the DATE makes everything work fine. This works as expected:
In [1]: my_data = [{'TICKET_NUMBER' : '001', 'DATE' : '2015-11-11'},
{'TICKET_NUMBER' : '002', 'DATE' : '2015-11-12'},
{'TICKET_NUMBER' : '003', 'DATE' : '2015-11-13'},
{'TICKET_NUMBER' : '004', 'DATE' : '2015-11-14'},
{'TICKET_NUMBER' : '005', 'DATE' : '2015-11-15'},
{'TICKET_NUMBER' : '006', 'DATE' : '2015-11-16'},
{'TICKET_NUMBER' : '007', 'DATE' : '2015-11-17'},
{'TICKET_NUMBER' : '008', 'DATE' : '2015-11-18'}]
(3) Setting the HOUR to 00:00 also makes it work as expected.
(4) pd.installed_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
LANG: es_ES.UTF-8
pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.3
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.6
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None
Comment From: rockg
I don't see a problem with this behavior. Dates without a time don't really exist when doing these types of operations. Everything has an embedded time with it. For example, '2015-11-13' is interpreted as '2015-11-13 00:00'. So I think what you really want is < '2015-11-14'.
Comment From: jorisvandenbossche
@rockg is correct. The date is interpreted as a timestamp with hour 00:00. This is the reason that the "MORE THAN OR EQUAL TO" case works as you expected. But I understand that this can be confusing.
You can see that the <=
actually works when specifying the hours:
In [5]: df[df.DATE<='2015-11-13 15:33']
Out[5]:
DATE TICKET_NUMBER
0 2015-11-11 11:15:00 001
1 2015-11-12 14:32:00 002
2 2015-11-13 15:33:00 003
In [6]: df[df.DATE<'2015-11-13 15:33']
Out[6]:
DATE TICKET_NUMBER
0 2015-11-11 11:15:00 001
1 2015-11-12 14:32:00 002
Comment From: AndresGlez
Thanks for your comment. Maybe you are right but that's not my point of view.
The thing is that when I want to see rows (events) with date less than or equal to 2015-11-13, I want events that also happened the day I'm considering (2015-11-13). Thats why I say "equal to" and not just "less than"
In fact,
df[df.DATE<='2015-11-13']
has the same output as
df[df.DATE<'2015-11-13']
so I think there is something wrong.
Comment From: jorisvandenbossche
@AndresGlez You have a datetime64[ns]
column, which has no notion of dates. So the string you provide is interpreted as a timestamp:
In [10]: pd.Timestamp('2015-11-13')
Out[10]: Timestamp('2015-11-13 00:00:00')
So you are actually comparing to this when using <='2015-11-13'
. In that light, the behaviour you see is as expected.