xref #12585

I think this ought to raise a ValueErrror:

In [1]: pd.to_datetime('01-13-2012', dayfirst=True)
Out[1]: datetime.datetime(2012, 1, 13, 0, 0)

This means if my data is bad (i.e. somehow I have a mixture of american and british string-dates), I won't be told there was an issue when parsing! I can't think of a usecase where this kind of precedence would be appropriate, and it means you'll end up with incorrect dates.

The precedence behaviour appears to be a feature from dateutil (which tslib.array_to_datetime calls):

In [2]: from dateutil.parser import parse as parse_date

In [3]: parse_date('1/12/2012', dayfirst=True)
Out[3]: datetime.datetime(2012, 12, 1, 0, 0)

In [4]: parse_date('1/13/2012', dayfirst=True)
Out[4]: datetime.datetime(2012, 1, 13, 0, 0)

It should raising like this:

In [5]: parse_date('14/14/2012', dayfirst=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-ca14fc16e421> in <module>()
----> 1 parse_date('14/14/2012', dayfirst=True)

/Library/Python/2.7/site-packages/dateutil/parser.pyc in parse(timestr, parserinfo, **kwargs)
    718         return parser(parserinfo).parse(timestr, **kwargs)
    719     else:
--> 720         return DEFAULTPARSER.parse(timestr, **kwargs)
    721
    722

/Library/Python/2.7/site-packages/dateutil/parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    315             if value is not None:
    316                 repl[attr] = value
--> 317         ret = default.replace(**repl)
    318         if res.weekday is not None and not res.day:
    319             ret = ret+relativedelta.relativedelta(weekday=res.weekday)

ValueError: month must be in 1..12

... :(

Comment From: hayd

Another usecase, is if you call to_datetime (without setting dayfirst) on a list of british date-strings, atm it parses without error but does so "incorrectly" (using a mixture of dayfirst and not, depending). Which is confusing.

Comment From: ghost

I'm shocked that dateutil does that.

Comment From: hayd

cc @jreback.

Comment From: jreback

I think we ought to drop this parameter in favor of just using format; dateutil just does the wrong thing. as an alternative, could just have dayfirst imply format="%d/%m/%Y"....or some such (but not nearly as flexible)

until/unless we actually do more date parsing internally

thoughts?

Comment From: hayd

but but but dayfirst is the best argument ever, it's very very useful for munging (it's great if you don't know the format or worse the format is mixed (!)).

It being not strict is sad, but not having it at all would be much worse. please please can we keep it!

I've looked before where in their code it's not being strict, but it's pretty messy and not sure how actively maintained it is...

I'm sure @wesm suggested another (C?) library fairly recently but can't find the issue.

Comment From: jreback

how about a big warning in the docstring/docs? that you really ought to specify format instead?

I see the warning you have, but EVEN bigger ? (and then in docstrings too), http://pandas.pydata.org/pandas-docs/dev/timeseries.html#converting-to-timestamps

Comment From: hayd

+1 on that, definitely users should know what they are letting themselves in for.

Comment From: jreback

alright...so let's just make this a doc enhancement for now?

Comment From: hayd

How's about (pr coming):

dayfirst : boolean, default False
    If True parses dates with the day first, eg 20/01/2005
    Warning: dayfirst=True is not strict, but will prefer to parse
    with day first (this is a known bug).

Comment From: jankatins

Switching from #11725, where this happened, e.g. daysfirst=False would parse month as days

>>> pd.to_datetime(["29.01.1945","1.3.1945", "02.03.1945"])
DatetimeIndex(['1945-01-29', '1945-01-03', '1945-02-03'], dtype='datetime64[ns]', freq=None)

right now, the doc only describes the problem for the other way:

Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).

As dayfirst is False, this makes no mention that the other way can also happen

Comment From: jorisvandenbossche

@JanSchulz Indeed .. Maybe also worth mentioning that if you want strict behaviour, you can use format=... if possible?

Comment From: jorisvandenbossche

I opened an issue for this at the dateutil tracker, as I couldn't find one: https://github.com/dateutil/dateutil/issues/214

Comment From: WillAyd

Not a pandas bug and documented accordingly

Comment From: jbrockmendel

This is one that we need to bug dateutil about. (Or offer a PR upstream). Pls consider reopening.

On Fri, Jul 6, 2018 at 5:19 PM William Ayd notifications@github.com wrote:

Not a pandas bug and documented accordingly

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/3341#issuecomment-403160488, or mute the thread https://github.com/notifications/unsubscribe-auth/AHtGeGhE01ZjGL7gUpzJNp9cnjr9uqrCks5uD-KKgaJpZM4Ak1oK .

Comment From: WillAyd

@jbrockmendel I closed because there's nothing technically to be done on the pandas side and this is arguably duplicative of #12585. Feel free to reopen if you think it's this issue in particular adds value to tracking