Code Sample, a copy-pastable example if possible

pd.to_datetime(["31/12/2014", "10/03/2011"])
Out[37]:
DatetimeIndex(['2014-12-31', '2011-10-03'], dtype='datetime64[ns]', freq=None)

Expected Output

DatetimeIndex(['2014-12-31', '2011-03-10'], dtype='datetime64[ns]', freq=None)

_Reason: Expect a default behavior (without extra format or dayfirst parameter) that implements a consistent datetime parsing within the same column._

Comments

Within the internal helper function pd.tseries.tools._to_datetime, the datetime format will be first inferred based on the first non-nan element, followed by a parsing via tslib.array_strptime(Line 357), which gives expected result.

pd.tseries.tools._guess_datetime_format_for_array(["31/12/2014", "10/03/2011"])
Out[25]:
'%d/%m/%Y'


pd.tslib.array_strptime(pd.tseries.tools.com._ensure_object(["31/12/2014", "10/03/2011"]), "%d/%m/%Y")
Out[46]:
array(['2014-12-31T08:00:00.000000000+0800',
       '2011-03-10T08:00:00.000000000+0800'], dtype='datetime64[ns]')

But the fallback function tslib.array_to_datetime doesn't use the inferred format(Line 373)

pd.tslib.array_to_datetime(pd.tseries.tools.com._ensure_object(["31/12/2014", "10/03/2011"]), format="%d/%m/%Y")
Out[51]:
array(['2014-12-31T08:00:00.000000000+0800',
       '2011-10-03T08:00:00.000000000+0800'], dtype='datetime64[ns]')

_This is caused by the default value (False) of parameter infer_datetime_format to the _to_datetime (Line 285) Suggestion: make parameter infer_datetime_format to the _to_datetime default to True, it won't solve all problems, but at least half of them, depending on the format of the first element._

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.8.13-44.el6uek.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 19.6.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None

Comment From: dolaameng

Also suggest to at least warn users of this behavior in the online document.

Comment From: crazycodingcat

Agree with the warning suggestion.

A common sense assumption is: the date format in the same column should be consistent. But currently, pandas just infers each datetime independently, and SILENTLY. It does not care if the dates parsed use the same format. This could be a pitfall if the user is not careful in checking the parsing results. A check on consistency and warning would be very helpful.

Comment From: jorisvandenbossche

This is a known issue with to_datetime, which is caused by the (too much) flexibility of dateutil. You can 'solve' it by providing a format argument. We should probably recommend this more strongly in the docs that you best pass a format argument if you want to be sure to have a consistent parsing.

See also https://github.com/pydata/pandas/issues/12501, https://github.com/pydata/pandas/issues/7348 and https://github.com/pydata/pandas/issues/3341

And we are indeed considering changing the default of infer_datetime_format to True, see https://github.com/pydata/pandas/issues/12061 for that.

Currently raising a warning is not possible as far as I know, as pandas just passes the strings to dateutil and pandas does not know how dateutil has parsed it (possibly inconsistent). And the flexibility of dateutil is also a feature of to_datetime to be able to handle messy datetime columns.

Comment From: dolaameng

Thanks for linking to the previous issues. I am thinking about some heriustics used by readr package in R, which infers column types or date formats from first 100 rows, instead of a single one. This can be done before passing an explicit format to dateutil?

Comment From: jorisvandenbossche

There could maybe be improvements to infer_datetime_format (as indeed, using only one row will often not be able to infer a day-first). Note that no format is passed to dateutil. When there is a format (passed or inferred), pandas does the parsing itself, it's when there is no fixed format that dateutil is used.

I made an overview issue for this: #12585

BTW, if you want to make some clarifications to the docs for now, PRs also very welcome!

Comment From: dolaameng

Proper documentation sounds like a good temp solution. Will try to help if possible.

Comment From: jorisvandenbossche

Closing in favor of the master issue #12585