MultiIndex
loses nanosecond precision
I have a dataframe where one column is dtype: datetime64[ns]
. This column has nanosecond precision.
I want to make this column, 'nanosecond_dt'
the index of my dataframe. I can do this no problem for a simple index:
new_df = df.set_index('nanosecond_dt')
ns_part_of_index = new_df.index.nanosecond
inspect ns_part_of_index
to check nanos are there
array([872, 128, 872, 128], dtype=int32)
Clearly the new DatetimeIndex still has nanosecond precision.
I actually want to make the column 'nanosecond_dt'
AND another string based column, 'name'
, into a MultiIndex for my dataframe.
Unfortunately this means the DatetimeIndex
part of the MultiIndex does not have nanosecond resolution. Why? Is this a known problem?
Or am I making a mistake somewhere?
new_multi_index_df = df.set_index(['nanosecond_dt','name'])
ns_part_of_index = new_multi_index_df.index.get_level_values('nanosecond_dt')
now inspect ns_part_of_index
for this MultiIndex:
array([0, 0, 0, 0], dtype=int32)
Unfortunately, the new DatetimeIndex as part of the MultiIndex has lost nanosecond precision.
Pandas 0.18.0
Comment From: ivremos
I've added a copy-pastable example, and I think I have found a subtlety of when this happens.
There is a surprising difference in Index vs MultiIndex behaviour depending on whether pandas NaT values are present.
If no pandas NaT values are present, the nanosecond precision seems to be preserved in both Index and MultiIndex, for the DatetimeIndex
df = pd.DataFrame({'nanosecond_dt_str':['2017-06-01 08:00:00.067735872','2017-06-01 08:00:00.068736128','2017-06-01 08:00:00.069735872','2017-06-01 08:00:00.071736128'],'name':['one','two','three','four']})
df['nanosecond_dt'] = pd.to_datetime(df['nanosecond_dt_str'])
new_df = df.set_index('nanosecond_dt')
ns_part_of_index = new_df.index.nanosecond
array([872, 128, 872, 128], dtype=int32)
keeps nanosecond precision
new_multi_index_df = df.set_index(['nanosecond_dt','name'])
ns_part_of_multi_index = new_multi_index_df.index.get_level_values('nanosecond_dt').nanosecond
array([872, 128, 872, 128], dtype=int32)
keeps nanosecond precision
However if I retry with a pandas NaT time present, then nanosecond precision is preserved for the Index but not for the MultiIndex
df = pd.DataFrame({'nanosecond_dt_str':['2017-06-01 08:00:00.067735872','2017-06-01 08:00:00.068736128','2017-06-01 08:00:00.069735872',pd.NaT],'name':['one','two','three','four']})
df['nanosecond_dt'] = pd.to_datetime(df['nanosecond_dt_str'])
new_df = df.set_index('nanosecond_dt')
ns_part_of_index = new_df.index.nanosecond
array([ 872., 128., 872., nan])
keeps nanosecond precision
new_multi_index_df = df.set_index(['nanosecond_dt','name'])
ns_part_of_multi_index = new_multi_index_df.index.get_level_values('nanosecond_dt').nanosecond
array([ 0., 0., 0., nan])
loses nanosecond precision
Why does Index vs MultiIndex behave differently in this case?
Can I restore nanosecond precision to my MultiIndex somehow?
Comment From: jreback
you have a quite old version of pandas. nanosecond precision was added around this time.
on master / 0.21.0 (just released)
In [15]: new_multi_index_df.index.get_level_values(0).nanosecond
Out[15]: Float64Index([872.0, 128.0, 872.0, nan], dtype='float64', name='nanosecond_dt')