Code Sample, a copy-pastable example if possible
import pandas
import numpy
df = pandas.DataFrame({'A': list(range(6)), 'B': list(range(6))},
index=pandas.MultiIndex.from_product(
[[numpy.datetime64('2017-10-10'), numpy.datetime64('2017-10-11')], range(3)],
names=['date', 'count']),
)
# inconsitency 1: drop level
print(df.loc[('2017-10-10')])
print(df.loc[(numpy.datetime64('2017-10-10'))])
# inconsitency 2: DataFrame or Series
print(df.loc[('2017-10-10', 1), :])
print(df.loc[('2017-10-10', 1)])
# inconsistency 3: raises TypeError
try:
print(df.loc[(numpy.datetime64('2017-10-10'), 1)])
except TypeError as e:
print(e)
try:
print(df.loc[(slice('2017-10-10'), slice(1))])
except TypeError as e:
print(e)
# workaround
# inconsitency 1: drop level
print(df.loc[(slice('2017-10-10'))])
print(df.loc[(slice(numpy.datetime64('2017-10-10')))])
# inconsitency 2: DataFrame or Series
print(df.loc[(slice('2017-10-10'), slice(1)), :])
# inconsistency 3: raises TypeError
print(df.loc[(slice(numpy.datetime64('2017-10-10')), slice(1)), :])
Problem description
When DataFrame (or Series) has a MultiIndex containing numpy.datetime64 in one of the levels, slicing with loc shows inconsistent behaviour. I haven't tested if these inconsistencies are specific to numpy.datetime64 or if they are are also present when using other date formats.
In my view, the output format of loc should not depend on the format of the slice.
Expected Output
It's not up to me to decide on the correct output format but they should be equal, independent of the slice format. Ideally it's the ouput generated in the 'workaround' section.
Output of pd.show_versions()
Comment From: jreback
please read the docs: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#indexing
a string is a slice, while an actual datetime (be it a Timestamp
, np.datetime64
etc) is a point in time.
you are passing different kinds of slicing, pandas returns you different output shapes
Here are ways to slice w/o dropping levels.
In [6]: df.loc[[numpy.datetime64('2017-10-10')]]
Out[6]:
A B
date count
2017-10-10 0 0 0
1 1 1
2 2 2
In [9]: df.loc[numpy.datetime64('2017-10-10'):np.datetime64('2017-10-10 12:00:00')]
Out[9]:
A B
date count
2017-10-10 0 0 0
1 1 1
2 2 2
Comment From: jreback
if you claim an inconsistency you will have to show how this is not documented and/or violates the currently rules as I referenced.