Code Sample, from Jupyter notebook
import pandas as pd
import datetime
example_multilevel_dict = {
(0, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(0, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, datetime.date(2018, 3, 3)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, datetime.date(2018, 3, 4)): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}
# MultiIndex rows are required, but this give multilevel columns
multicolumn_df = pd.DataFrame.from_dict(example_multilevel_dict)
print(multicolumn_df)
# but this is what 'orient' is for, however all values are NaN
multiindex_df = pd.DataFrame.from_dict(example_multilevel_dict, orient='index')
print(multiindex_df)
# but when you transpose the multicolumn_df, you get it right
print(multicolumn_df.T)
# This is not the case when the tuple contains strings instead of dates
ok_multilevel_dict = {
(0, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(0, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, 'a'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20},
(1, 'b'): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}
}
print(pd.DataFrame.from_dict(ok_multilevel_dict , orient='index'))
Output:
0 1
2018-03-03 2018-03-04 2018-03-03 2018-03-04
A 5 5 5 5
B 1344 1344 1344 1344
C 0 0 0 0
D 48 48 48 48
E 20 20 20 20
A B C D E
0 2018-03-03 NaN NaN NaN NaN NaN
2018-03-04 NaN NaN NaN NaN NaN
1 2018-03-03 NaN NaN NaN NaN NaN
2018-03-04 NaN NaN NaN NaN NaN
A B C D E
0 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
1 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
Problem description
When constructing a DataFrame from a dictionary with date objects from the datetime library, values are set to NaN when using orient='index'. See examples.
Please note that I suppose that using orient='index' should be equal with a transpose.
Expected Output
print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))
A B C D E
0 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
1 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
Output of pd.show_versions()
Comment From: jreback
datetime.date
are not a first class type and you end up with edge cases. If you look at the transposed version these are converted to a DatetimeIndex
.
This code path in .from_dict
does not have much coverage and is likely not handling the conversion keys as much as the 'regular' path. You are welcome to have a look.
Comment From: williamkhshea
@jreback The issue is that when arrays
is passed into _arrays_to_mgr
the datetime.date
in index is not evaluated, while extract_index
returns a MultiIndex in which the datetime.date
got parsed into Timestamp
and is eventually used to match against the arrays
, therefore the array cannot be matched based on the index extracted.
One way to solve the issue is to parse the keys in arrays
into Timestamp first in _arrays_to_mgr
before any further operations, but I am not sure whether it is the best way to do it. Any opinions?
Comment From: jreback
not sure of the beat place; see how the from_dict path is handled - this should be similar
Comment From: williamkhshea
@jreback Wend down the from_dict
path rabbit hold and discovered that when MultiIndex
is constructed, it convert any datetimelike
values, while Index
construction doesn't do that. Not sure why the difference in behaviour.
https://github.com/pandas-dev/pandas/blob/85817a72897be808fb6d311f65fc6031e0216603/pandas/core/arrays/categorical.py#L316
Setting convert_dates = False
fixed this particular issue, though I am not sure if there are any implications in other area.
Comment From: jreback
@williamkhshea you can't change that as this will break a number of existing behavior. The point is that datetime.date
are not treated as first class and converted almost immediately. I think a more straightfroward way to to do this is like .from_dict
convert the keys.
Comment From: williamkhshea
@jreback You can't simply convert the keys in .from_dict
as single level dict is still using datetime
to match.
Index
in single level dict
MultiIndex
in multilevel dict
If datetime.date
is not treated as first class I think the way to go is to convert the datetime.date
in single level dict when it is constructed and convert the datetime.date
keys in .from_dict
.
Comment From: carlosdeoncedos
I have a similar issue, using Jupyter Notebook. It's seems to be related to @hodossy post.
import pandas as pd
import datetime
concept_list = ['a', 'b', 'c']
date_list = [datetime.date(2018, 10, 8), datetime.date(2018, 10, 9),
datetime.date(2018, 10, 10)]
dictionary = {'John': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
datetime.date(2018, 10, 10): 1},
'b': {datetime.date(2018, 10, 8): 2,datetime.date(2018, 10, 9): 2,
datetime.date(2018, 10, 10): 2},
'c': {datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
datetime.date(2018, 10, 10): 3}},
'Alice': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
datetime.date(2018, 10, 10): 1},
'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2,
datetime.date(2018, 10, 10): 2},
'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
datetime.date(2018, 10, 10): 3}},
'Rupert': {'a': {datetime.date(2018, 10, 8): 1, datetime.date(2018, 10, 9): 1,
datetime.date(2018, 10, 10): 1},
'b': {datetime.date(2018, 10, 8): 2, datetime.date(2018, 10, 9): 2,
datetime.date(2018, 10, 10): 2},
'c':{datetime.date(2018, 10, 8): 3, datetime.date(2018, 10, 9): 3,
datetime.date(2018, 10, 10): 3}}}
index_list = [concept_list, date_list]
index = pd.MultiIndex.from_product(index_list, names = ['Concept', 'Date'])
df = pd.DataFrame(dictonary, index = index)
df
The output:
John Alice Rupert
Concept Date
a 2018-10-08 NaN NaN NaN
2018-10-09 NaN NaN NaN
2018-10-10 NaN NaN NaN
b 2018-10-08 NaN NaN NaN
2018-10-09 NaN NaN NaN
2018-10-10 NaN NaN NaN
c 2018-10-08 NaN NaN NaN
2018-10-09 NaN NaN NaN
2018-10-10 NaN NaN NaN
Expected output:
John Alice Rupert
Concept Date
a 2018-10-08 1 1 1
2018-10-09 2 2 2
2018-10-10 3 3 3
b 2018-10-08 1 1 1
2018-10-09 2 2 2
2018-10-10 3 3 3
c 2018-10-08 1 1 1
2018-10-09 2 2 2
2018-10-10 3 3 3
Comment From: afzalhasn
My solution is Below. Is this approach is okay or not?
example_multilevel_dict = { (0, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (0, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (1, str(datetime.date(2018, 3, 3))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20}, (1, str(datetime.date(2018, 3, 4))): {'A': 5, 'B': 1344, 'C': 0, 'D': 48, 'E': 20} }
pd.DataFrame.from_dict(example_multilevel_dict,orient='index')
OUTPUT: A B C D E 0 2018-03-03 5 1344 0 48 20 2018-03-04 5 1344 0 48 20 1 2018-03-03 5 1344 0 48 20 2018-03-04 5 1344 0 48 20
Comment From: afzalhasn
Comment From: mroeschke
This looks to work on master now. Could use a test
In [33]: print(pd.DataFrame.from_dict(example_multilevel_dict , orient='index'))
A B C D E
0 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
1 2018-03-03 5 1344 0 48 20
2018-03-04 5 1344 0 48 20
Comment From: shteken
take