Code Sample, from Jupyter notebook
import pandas as pd
import datetime
example_multilevel_dict = {
(0, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(0, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}
# MultiIndex rows are required, but this give multilevel columns
multicolumn_df = pd.DataFrame.from_dict(example_multilevel_dict)
print(multicolumn_df)
# but this is what 'orient' is for, however all values are NaN
multiindex_df = pd.DataFrame.from_dict(example_multilevel_dict, orient='index')
print(multiindex_df)
# but when you transpose the multicolumn_df, you get it right
print(multicolumn_df.T)
# This is not the case when the tuple contains strings instead of dates
ok_multilevel_dict = {
(0, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(0, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}
print(pd.DataFrame.from_dict(ok_multilevel_dict , orient='index'))
Output:
0 1
2018-03-03 2018-03-04 2018-03-03 2018-03-04
A 5 5 5 5
B 1344 1344 1344 1344
C 0 0 0 0
D 48 48 48 48
E 20 20 20 20
A B C D E
0 2018-03-03 NaN NaN NaN NaN NaN
2018-03-04 NaN NaN NaN NaN NaN
1 2018-03-03 NaN NaN NaN NaN NaN
2018-03-04 NaN NaN NaN NaN NaN
A B C D E
0 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
1 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
Problem description
When constructing a DataFrame from a dictionary with date objects from the datetime library, values are set to NaN when using orient='index'. See examples.
Please note that I suppose that using orient='index' should be equal with a transpose.
Expected Output
print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))
A B C D E
0 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
1 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
Output of pd.show_versions()
Comment From: jreback
datetime.date are not a first class type and you end up with edge cases. If you look at the transposed version these are converted to a DatetimeIndex.
This code path in .from_dict does not have much coverage and is likely not handling the conversion keys as much as the 'regular' path. You are welcome to have a look.
Comment From: williamkhshea
@jreback The issue is that when arrays is passed into _arrays_to_mgr the datetime.date in index is not evaluated, while extract_index returns a MultiIndex in which the datetime.date got parsed into Timestamp and is eventually used to match against the arrays, therefore the array cannot be matched based on the index extracted.
One way to solve the issue is to parse the keys in arrays into Timestamp first in _arrays_to_mgr before any further operations, but I am not sure whether it is the best way to do it. Any opinions?
Comment From: jreback
not sure of the beat place; see how the from_dict path is handled - this should be similar
Comment From: williamkhshea
@jreback Wend down the from_dict path rabbit hold and discovered that when MultiIndex is constructed, it convert any datetimelike values, while Index construction doesn't do that. Not sure why the difference in behaviour.
https://github.com/pandas-dev/pandas/blob/85817a72897be808fb6d311f65fc6031e0216603/pandas/core/arrays/categorical.py#L316
Setting convert_dates = False fixed this particular issue, though I am not sure if there are any implications in other area.
Comment From: jreback
@williamkhshea you can't change that as this will break a number of existing behavior. The point is that datetime.date are not treated as first class and converted almost immediately. I think a more straightfroward way to to do this is like .from_dict convert the keys.
Comment From: williamkhshea
@jreback You can't simply convert the keys in .from_dict as single level dict is still using datetime to match.
Index in single level dict
MultiIndex in multilevel dict
If datetime.date is not treated as first class I think the way to go is to convert the datetime.date in single level dict when it is constructed and convert the datetime.date keys in .from_dict.
Comment From: carlosdeoncedos
I have a similar issue, using Jupyter Notebook. It's seems to be related to @hodossy post.
import pandas as pd
import datetime
concept_list = ['a', 'b', 'c']
date_list = [datetime.date(2018, 10, 8), datetime.date(2018, 10, 9),
datetime.date(2018, 10, 10)]
dictionary = {'John': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
datetime.date(2018, 10, 10): 1},
'b': {datetime.date(2018, 10, 8): 2,datetime.date(2018, 10, 9): 2,
datetime.date(2018, 10, 10): 2},
'c': {datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
datetime.date(2018, 10, 10): 3}},
'Alice': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
datetime.date(2018, 10, 10): 1},
'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2,
datetime.date(2018, 10, 10): 2},
'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
datetime.date(2018, 10, 10): 3}},
'Rupert': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
datetime.date(2018, 10, 10): 1},
'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2,
datetime.date(2018, 10, 10): 2},
'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
datetime.date(2018, 10, 10): 3}}}
index_list = [concept_list, date_list]
index = pd.MultiIndex.from_product(index_list, names = ['Concept', 'Date'])
df = pd.DataFrame(dictonary, index = index)
df
The output:
John Alice Rupert
Concept Date
a 2018-10-08 NaN NaN NaN
2018-10-09 NaN NaN NaN
2018-10-10 NaN NaN NaN
b 2018-10-08 NaN NaN NaN
2018-10-09 NaN NaN NaN
2018-10-10 NaN NaN NaN
c 2018-10-08 NaN NaN NaN
2018-10-09 NaN NaN NaN
2018-10-10 NaN NaN NaN
Expected output:
John Alice Rupert
Concept Date
a 2018-10-08 1 1 1
2018-10-09 2 2 2
2018-10-10 3 3 3
b 2018-10-08 1 1 1
2018-10-09 2 2 2
2018-10-10 3 3 3
c 2018-10-08 1 1 1
2018-10-09 2 2 2
2018-10-10 3 3 3
Comment From: afzalhasn
My solution is Below. Is this approach is okay or not?
example_multilevel_dict = { (0, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (0, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (1, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (1, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20} }
pd.DataFrame.from_dict(example_multilevel_dict,orient='index')
OUTPUT: A B C D E 0 2018-03-03 5 1344 0 48 20 2018-03-04 5 1344 0 48 20 1 2018-03-03 5 1344 0 48 20 2018-03-04 5 1344 0 48 20
Comment From: afzalhasn
Comment From: mroeschke
This looks to work on master now. Could use a test
In [33]: print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))
A B C D E
0 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
1 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
Comment From: shteken
take