Code Sample

In this example, we merge multi-indexed dataframes, where one level contains datetime.date objects. We illustrate the inconsistent behavior by comparing the output of two merge operations. 1. Let us create the three dataframes {df1, df2, df3} such that * df1 and df2 indexes overlap * df1 and df3 indexes do not overlap

>>> index_1 = pd.MultiIndex(levels=[['dummy'], [dt.date(2016,1,1)]], 
...                         labels=[[0], [0]], names=['i', 'j'])
>>> index_2 = pd.MultiIndex(levels=[['dummy'], [dt.date(2016,1,1)]], 
...                         labels=[[0], [0]], names=['i', 'j'])
>>> index_3 = pd.MultiIndex(levels=[['dummy'], [dt.date(2016,1,2)]], 
...                         labels=[[0], [0]], names=['i', 'j'])
>>> df1 = pd.DataFrame([1], index=index_1, columns=['A'])
>>> df2 = pd.DataFrame([1], index=index_2, columns=['B'])
>>> df3 = pd.DataFrame([1], index=index_3, columns=['C'])
>>> df1
                  A
i     j            
dummy 2016-01-01  1
  1. Let us merge using the indexes.
>>> merge1 = df1.merge(df2, left_index=True, right_index=True, how='outer')
>>> merge2 = df1.merge(df3, left_index=True, right_index=True, how='outer')
>>> merge1
                  A  B
i     j               
dummy 2016-01-01  1  1
>>> merge2
                    A    C
i     j                   
dummy 2016-01-01  1.0  NaN
      2016-01-02  NaN  1.0
  1. Let us examine the indexes
>>> merge1.index[0][1]
datetime.date(2016, 1, 1)
>>> merge2.index[0][1]
Timestamp('2016-01-01 00:00:00')

Problem description

The index type of the merge output is inconsistent. In my opinion, the merge operation should not change the types.

Expected Output

>>> merge2.index[0][1]
datetime.date(2016, 1, 1)

Output of pd.show_versions()

commit: None python: 3.5.1.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-229.7.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_IE.UTF-8 LOCALE: en_IE.UTF-8 pandas: 0.19.1 nose: None pip: 8.1.2 setuptools: 27.2.0 Cython: 0.24.1a numpy: 1.11.2 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.0.0 sphinx: None patsy: 0.4.1 dateutil: 2.5.2 pytz: 2016.4 blosc: None bottleneck: None tables: None numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.3 pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None

Comment

This bug does not occur for single-index dataframes

>>> df1 = pd.DataFrame([1], index=[dt.date(2016,1,1)], columns=['A'])
>>> df3 = pd.DataFrame([1], index=[dt.date(2016,1,2)], columns=['B'])
>>> df1.merge(df3, left_index=True, right_index=True, how='outer').index[0]
datetime.date(2016, 1, 1)

Comment From: jreback

datetime.date is not a first class type and you should simply not use it. This works correctly and properly for input of datetime.datetime which are coerced to Timestamp and fully handled.

the inference machinery in all of pandas tries pretty hard to coerce datetimelikes even if they end up as object; this is why the second one is the correct result.

this has a special check for datetime.date but its completely non-performant and not really supported.

if you want to submit a fix would take it (assuming it doesn't break anythng else). but we are not generally suppoprting datetime.date.