Code Sample, from Jupyter notebook

import pandas as pd
import datetime

example_multilevel_dict = {
  (0, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (0, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}

# MultiIndex rows are required, but this give multilevel columns
multicolumn_df = pd.DataFrame.from_dict(example_multilevel_dict)
print(multicolumn_df)

# but this is what 'orient' is for, however all values are NaN
multiindex_df = pd.DataFrame.from_dict(example_multilevel_dict, orient='index')
print(multiindex_df)

# but when you transpose the multicolumn_df, you get it right
print(multicolumn_df.T)

# This is not the case when the tuple contains strings instead of dates
ok_multilevel_dict = {
  (0, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (0, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
  (1, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}
print(pd.DataFrame.from_dict(ok_multilevel_dict , orient='index'))

Output:

           0                     1           
  2018-03-03 2018-03-04 2018-03-03 2018-03-04
A          5          5          5          5
B       1344       1344       1344       1344
C          0          0          0          0
D         48         48         48         48
E         20         20         20         20
               A   B   C   D   E
0 2018-03-03 NaN NaN NaN NaN NaN
  2018-03-04 NaN NaN NaN NaN NaN
1 2018-03-03 NaN NaN NaN NaN NaN
  2018-03-04 NaN NaN NaN NaN NaN
              A     B  C   D   E
0 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20
1 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20

Problem description

When constructing a DataFrame from a dictionary with date objects from the datetime library, values are set to NaN when using orient='index'. See examples.

Please note that I suppose that using orient='index' should be equal with a transpose.

Expected Output

print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))

              A     B  C   D   E
0 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20
1 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.22.0 pytest: None pip: 9.0.1 setuptools: 28.8.0 Cython: None numpy: 1.14.1 scipy: None pyarrow: None xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0.1 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

datetime.date are not a first class type and you end up with edge cases. If you look at the transposed version these are converted to a DatetimeIndex.

This code path in .from_dict does not have much coverage and is likely not handling the conversion keys as much as the 'regular' path. You are welcome to have a look.

Comment From: williamkhshea

@jreback The issue is that when arrays is passed into _arrays_to_mgr the datetime.date in index is not evaluated, while extract_index returns a MultiIndex in which the datetime.date got parsed into Timestamp and is eventually used to match against the arrays, therefore the array cannot be matched based on the index extracted.

screen shot 2018-03-24 at 4 17 41 pm

One way to solve the issue is to parse the keys in arrays into Timestamp first in _arrays_to_mgr before any further operations, but I am not sure whether it is the best way to do it. Any opinions?

Comment From: jreback

not sure of the beat place; see how the from_dict path is handled - this should be similar

Comment From: williamkhshea

@jreback Wend down the from_dict path rabbit hold and discovered that when MultiIndex is constructed, it convert any datetimelike values, while Index construction doesn't do that. Not sure why the difference in behaviour.

https://github.com/pandas-dev/pandas/blob/85817a72897be808fb6d311f65fc6031e0216603/pandas/core/arrays/categorical.py#L316

Setting convert_dates = False fixed this particular issue, though I am not sure if there are any implications in other area.

Comment From: jreback

@williamkhshea you can't change that as this will break a number of existing behavior. The point is that datetime.date are not treated as first class and converted almost immediately. I think a more straightfroward way to to do this is like .from_dict convert the keys.

Comment From: williamkhshea

@jreback You can't simply convert the keys in .from_dict as single level dict is still using datetime to match.

Index in single level dict screen shot 2018-03-26 at 6 15 09 am

MultiIndex in multilevel dict screen shot 2018-03-26 at 6 15 55 am

If datetime.date is not treated as first class I think the way to go is to convert the datetime.date in single level dict when it is constructed and convert the datetime.date keys in .from_dict.

Comment From: carlosdeoncedos

I have a similar issue, using Jupyter Notebook. It's seems to be related to @hodossy post.


import pandas as pd
import datetime

concept_list = ['a', 'b', 'c']
date_list = [datetime.date(2018, 10, 8), datetime.date(2018, 10, 9), 
             datetime.date(2018, 10, 10)]

dictionary = {'John': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
                         datetime.date(2018, 10, 10): 1}, 
                 'b': {datetime.date(2018, 10, 8): 2,datetime.date(2018, 10, 9): 2, 
                         datetime.date(2018, 10, 10): 2}, 
                 'c': {datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
                         datetime.date(2018, 10, 10): 3}},
        'Alice': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
                         datetime.date(2018, 10, 10): 1}, 
                 'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2, 
                         datetime.date(2018, 10, 10): 2}, 
                 'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
                         datetime.date(2018, 10, 10): 3}},
        'Rupert': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
                         datetime.date(2018, 10, 10): 1}, 
                  'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2, 
                         datetime.date(2018, 10, 10): 2}, 
                  'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
                         datetime.date(2018, 10, 10): 3}}}

index_list = [concept_list, date_list]
index = pd.MultiIndex.from_product(index_list, names = ['Concept', 'Date'])

df = pd.DataFrame(dictonary, index = index)
df

The output:

                          John    Alice    Rupert
Concept  Date
a        2018-10-08        NaN     NaN      NaN
         2018-10-09        NaN     NaN      NaN
         2018-10-10        NaN     NaN      NaN

b        2018-10-08        NaN     NaN      NaN
         2018-10-09        NaN     NaN      NaN
         2018-10-10        NaN     NaN      NaN

c        2018-10-08        NaN     NaN      NaN
         2018-10-09        NaN     NaN      NaN
         2018-10-10        NaN     NaN      NaN

Expected output:

                         John  Alice  Rupert
Concept  Date
a        2018-10-08        1     1      1
         2018-10-09        2     2      2
         2018-10-10        3     3      3

b        2018-10-08        1     1      1
         2018-10-09        2     2      2
         2018-10-10        3     3      3

c        2018-10-08        1     1      1
         2018-10-09        2     2      2
         2018-10-10        3     3      3

Comment From: afzalhasn

My solution is Below. Is this approach is okay or not?

example_multilevel_dict = { (0, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (0, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (1, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (1, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20} }

pd.DataFrame.from_dict(example_multilevel_dict,orient='index')

OUTPUT: A B C D E 0 2018-03-03 5 1344 0 48 20 2018-03-04 5 1344 0 48 20 1 2018-03-03 5 1344 0 48 20 2018-03-04 5 1344 0 48 20

Comment From: afzalhasn

Screenshot 2019-11-11 at 11 35 29 PM

Comment From: mroeschke

This looks to work on master now. Could use a test

In [33]: print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))
              A     B  C   D   E
0 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20
1 2018-03-03  5  1344  0  48  20
  2018-03-04  5  1344  0  48  20

Comment From: shteken

take