xref #12585
I think this ought to raise a ValueErrror:
In [1]: pd.to_datetime('01-13-2012', dayfirst=True)
Out[1]: datetime.datetime(2012, 1, 13, 0, 0)
This means if my data is bad (i.e. somehow I have a mixture of american and british string-dates), I won't be told there was an issue when parsing! I can't think of a usecase where this kind of precedence would be appropriate, and it means you'll end up with incorrect dates.
The precedence behaviour appears to be a feature from dateutil
(which tslib.array_to_datetime
calls):
In [2]: from dateutil.parser import parse as parse_date
In [3]: parse_date('1/12/2012', dayfirst=True)
Out[3]: datetime.datetime(2012, 12, 1, 0, 0)
In [4]: parse_date('1/13/2012', dayfirst=True)
Out[4]: datetime.datetime(2012, 1, 13, 0, 0)
It should raising like this:
In [5]: parse_date('14/14/2012', dayfirst=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-ca14fc16e421> in <module>()
----> 1 parse_date('14/14/2012', dayfirst=True)
/Library/Python/2.7/site-packages/dateutil/parser.pyc in parse(timestr, parserinfo, **kwargs)
718 return parser(parserinfo).parse(timestr, **kwargs)
719 else:
--> 720 return DEFAULTPARSER.parse(timestr, **kwargs)
721
722
/Library/Python/2.7/site-packages/dateutil/parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
315 if value is not None:
316 repl[attr] = value
--> 317 ret = default.replace(**repl)
318 if res.weekday is not None and not res.day:
319 ret = ret+relativedelta.relativedelta(weekday=res.weekday)
ValueError: month must be in 1..12
... :(
Comment From: hayd
Another usecase, is if you call to_datetime (without setting dayfirst) on a list of british date-strings, atm it parses without error but does so "incorrectly" (using a mixture of dayfirst and not, depending). Which is confusing.
Comment From: ghost
I'm shocked that dateutil does that.
Comment From: hayd
cc @jreback.
Comment From: jreback
I think we ought to drop this parameter in favor of just using format
; dateutil just does the wrong thing. as an alternative, could just have dayfirst imply format="%d/%m/%Y"
....or some such (but not nearly as flexible)
until/unless we actually do more date parsing internally
thoughts?
Comment From: hayd
but but but dayfirst is the best argument ever, it's very very useful for munging (it's great if you don't know the format or worse the format is mixed (!)).
It being not strict is sad, but not having it at all would be much worse. please please can we keep it!
I've looked before where in their code it's not being strict, but it's pretty messy and not sure how actively maintained it is...
I'm sure @wesm suggested another (C?) library fairly recently but can't find the issue.
Comment From: jreback
how about a big warning in the docstring/docs? that you really ought to specify format instead?
I see the warning you have, but EVEN bigger ? (and then in docstrings too), http://pandas.pydata.org/pandas-docs/dev/timeseries.html#converting-to-timestamps
Comment From: hayd
+1 on that, definitely users should know what they are letting themselves in for.
Comment From: jreback
alright...so let's just make this a doc enhancement for now?
Comment From: hayd
How's about (pr coming):
dayfirst : boolean, default False
If True parses dates with the day first, eg 20/01/2005
Warning: dayfirst=True is not strict, but will prefer to parse
with day first (this is a known bug).
Comment From: jankatins
Switching from #11725, where this happened, e.g. daysfirst=False would parse month as days
>>> pd.to_datetime(["29.01.1945","1.3.1945", "02.03.1945"])
DatetimeIndex(['1945-01-29', '1945-01-03', '1945-02-03'], dtype='datetime64[ns]', freq=None)
right now, the doc only describes the problem for the other way:
Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).
As dayfirst is False, this makes no mention that the other way can also happen
Comment From: jorisvandenbossche
@JanSchulz Indeed .. Maybe also worth mentioning that if you want strict behaviour, you can use format=...
if possible?
Comment From: jorisvandenbossche
I opened an issue for this at the dateutil tracker, as I couldn't find one: https://github.com/dateutil/dateutil/issues/214
Comment From: WillAyd
Not a pandas bug and documented accordingly
Comment From: jbrockmendel
This is one that we need to bug dateutil about. (Or offer a PR upstream). Pls consider reopening.
On Fri, Jul 6, 2018 at 5:19 PM William Ayd notifications@github.com wrote:
Not a pandas bug and documented accordingly
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/3341#issuecomment-403160488, or mute the thread https://github.com/notifications/unsubscribe-auth/AHtGeGhE01ZjGL7gUpzJNp9cnjr9uqrCks5uD-KKgaJpZM4Ak1oK .
Comment From: WillAyd
@jbrockmendel I closed because there's nothing technically to be done on the pandas side and this is arguably duplicative of #12585. Feel free to reopen if you think it's this issue in particular adds value to tracking