I found a bad behaviour for the dayfirst option in to_datetime. It's seems that dayfirst= True is only applied on the first row.
import pandas as pd
df = pd.DataFrame(['2018-04-11 10:30:00','2018-04-11 10:30:00','2018-04-11 10:15:00'],columns=['date'])
df["date dayFristTrue"] = pd.to_datetime(df["date"], dayfirst=True)
df["date dayFristFalse"] = pd.to_datetime(df["date"], dayfirst=False)
print df
date date dayFristTrue date dayFristFalse
0 2018-04-11 10:30:00 2018-04-11 10:30:00 2018-04-11 10:30:00
1 2018-04-11 10:30:00 2018-11-04 10:30:00 2018-04-11 10:30:00
2 2018-04-11 10:15:00 2018-11-04 10:15:00 2018-04-11 10:15:00
Output of pd.show_versions()
Comment From: TomAugspurger
I think it's the extra spaces in your later dates:
In [12]: pd.to_datetime('2018-04-11 10:30:00', dayfirst=True)
Out[12]: Timestamp('2018-04-11 10:30:00')
In [13]: pd.to_datetime('2018-04-11 10:30:00', dayfirst=True)
Out[13]: Timestamp('2018-11-04 10:30:00')
I think there's a warning in the docs about this kind of behavior:
Warning: dayfirst=True is not strict, but will prefer to parse
with day first (this is a known bug, based on dateutil behavior).
Though, strangely, python-datetuil does handle the spaces. Perhaps an issue in pandas then.
In [26]: dateutil.parser.parse('2018-04-11 10:30:00', dayfirst=True)
Out[26]: datetime.datetime(2018, 11, 4, 10, 30)
In [27]: dateutil.parser.parse('2018-04-11 10:30:00', dayfirst=True)
Out[27]: datetime.datetime(2018, 11, 4, 10, 30)
Comment From: chris-b1
The issue here is that your date 'looks' like it is ISO-8601 formatted - YYYY-MM-DD, which we have a special fastpath for.
The dayfirst
keyword was really targeted to disambiguate MM/DD/YYYY vs DD/MM/YYYY. If it is at all within your control I'd strongly recommend against storing dates in YYYY-DD-MM.
That said, would consider this a bug, as @TomAugspurger showed dateutil can handle this, you are welcome to have a look.
Comment From: MarcoGorelli
Thanks for the report
As of PDEP4, this is fixed:
In [45]: df = pd.DataFrame(['2018-04-11 10:30:00','2018-04-11 10:30:00','2018-04-11 10:15:00'],columns=['date'])
In [46]: df["date dayFristTrue"] = pd.to_datetime(df["date"], dayfirst=True)
In [47]: df["date dayFristFalse"] = pd.to_datetime(df["date"], dayfirst=False)
In [48]: df
Out[48]:
date date dayFristTrue date dayFristFalse
0 2018-04-11 10:30:00 2018-11-04 10:30:00 2018-04-11 10:30:00
1 2018-04-11 10:30:00 2018-11-04 10:30:00 2018-04-11 10:30:00
2 2018-04-11 10:15:00 2018-11-04 10:15:00 2018-04-11 10:15:00
Closing then, but thanks for the report!