related #6066 (on Float64Index too)
The 'KeyError' on slicing was discussed multiple times but I'm still not sure if the issue below is a bug or just my misunderstanding.
I'm experiencing 'KeyError' from time to time when I try to slice my dataframes with datetime or Timestamp objects, however slicing with strings works perfectly. I was unable to construct synthetic example with pandas.date_range, so I needed to upload the piece of real data where the issue appears: https://www.dropbox.com/s/ibzbwqs35tiydyc/tmp.h5
df = pd.read_hdf('tmp.h5', 'data')
print df.head()
When I try to slice it with the pandas.Timestamp objects it results in 'KeyError':
tt1 = pd.Timestamp('2006-11-16 18:30:00')
tt2 = pd.Timestamp('2006-11-16 18:40:00')
print df.ix[tt1:tt2].head()
Same for datetime objects:
tt1 = pd.Timestamp('2006-11-16 18:30:00').to_datetime()
tt2 = pd.Timestamp('2006-11-16 18:40:00').to_datetime()
print df.ix[tt1:tt2].head()
However this slicing works perfectly:
tt1 = pd.Timestamp('2006-11-16 18:30:00').strftime('%Y-%m-%d %H:%M:%S')
tt2 = pd.Timestamp('2006-11-16 18:40:00').strftime('%Y-%m-%d %H:%M:%S')
print df[tt1:tt2].head()
Numpy version 1.8.0 Pandas version 0.13.0
Comment From: jreback
Your index is not monotonic (e.g. sorted). This is more of an incorrect error report
If you sort it it works (with exact indexes or not)
In [14]: df.sort_index().ix[Timestamp('2006-11-16 18:30:04.101497'):Timestamp('2006-11-16 18:30:06.075980')]
Out[14]:
val
2006-11-16 18:30:04.101497+00:00 0
2006-11-16 18:30:04.101497+00:00 0
2006-11-16 18:30:04.101497+00:00 0
2006-11-16 18:30:06.075980+00:00 0
2006-11-16 18:30:06.075980+00:00 0
[5 rows x 1 columns]
In [15]: df.sort_index().ix[Timestamp('2006-11-16 18:30:04.101497'):Timestamp('2006-11-16 18:30:06')]
Out[15]:
val
2006-11-16 18:30:04.101497+00:00 0
2006-11-16 18:30:04.101497+00:00 0
2006-11-16 18:30:04.101497+00:00 0
[3 rows x 1 columns]
In [16]: df.sort_index().ix[Timestamp('2006-11-16 18:30:04'):Timestamp('2006-11-16 18:30:06')]
Out[16]:
val
2006-11-16 18:30:04.101497+00:00 0
2006-11-16 18:30:04.101497+00:00 0
2006-11-16 18:30:04.101497+00:00 0
[3 rows x 1 columns]
This should be a ValueError I think; so its a 'bug' on the error report
In [17]: df.ix[Timestamp('2006-11-16 18:30:04.101497'):Timestamp('2006-11-16 18:30:06.075980')]
KeyError: 'cannot peform a slice operation on a non-unique non-monotonic index'
@jtratner agree?
Comment From: vfilimonov
@jreback thanks for pointing out the problem with my dataset! I was not aware of it.
Comment From: jreback
closing as not a bug
Comment From: KristianHolsheimer
Hi there, I know this issue is closed, since it's not a bug. I would argue, though, that the error message could point the user in the right direction (I googled the error message and it lead me here).
Comment From: leonsas
I second Kristian. And I'll provide a bit more context which can be helpful:
trimmed_series = series[event.startTime:event.endTime]
Which throws KeyError: 1472641846000000000
. Initially I thought that maybe event.endTime happens to be after the last index of series
, and that maybe pandas isn't smart in dealing with out of bound errors. Obviously pandas does deal with this and it was the index not being sorted (which was a bug elsewhere in my code), which I found here. But it definitely isn't intuitive.
Comment From: jreback
@leonsas @KristianHolsheimer
if you have a nice reproducible example, pls open a new issue (and xref this one). This is well-defined behavior. I closed this because its not a bug, though, it could/should be a ValueError
with an even more informative error message.
Comment From: kdebrab
As requested by @jreback, I hereby create a nice reproducible example:
import pandas as pd
index = pd.date_range('2016-10-29 23:00', '2016-10-30 3:00', freq='15T', tz='UTC')
index = index.tz_convert('Europe/Brussels').tz_localize(None)
ts = pd.Series(1, index=index)
ts.truncate(before='2016-10-30 2:10')
which raises:
KeyError: 1477793400000000000L
In above example, the index is non-sorted due to the naive local time, comprising a switch from Summer to Winter time.
Slicing the data as:
ts[pd.Timestamp('2016-10-30 2:10'):]
returns the same KeyError.
Truncating with a time that is contained inside the index works just fine (unless you choose a non-unique label):
ts.truncate(before='2016-10-30 1:30')
Interestingly:
ts['2016-10-30 2:10':]
raises no error, returning the correct (non-sorted !) result. Even when choosing a duplicate label!
All different operations work on the sorted series, e.g.:
ts.sort_index().truncate(before='2016-10-30 2:10')
Though now the result is sorted as well of course.
I'm using pandas version 0.19.1
Comment From: jreback
@kdebrab the only thing that would be nicer would be the actual KeyError message for a not-found label. (It should show it as a Timestamp). So would take a fix for that.
you can open a new issue, or push a PR if that works for you.
Comment From: jorisvandenbossche
The truncate
does just a slice under the hood, so the example of @kdebrab becomes:
In [12]: ts[pd.Timestamp('2016-1-1'):]
...
KeyError: 1451606400000000000
So which raises an error, because you are slicing with a non-present value on a non-sorted index. To give a more simple example for this:
In [33]: s = pd.Series(range(5), index=[1,2,3,2,4])
# works because '1' is present (and is unique), so unsortedness does not matter
In [34]: s.loc[1:]
Out[34]:
1 0
2 1
3 2
2 3
4 4
dtype: int64
# fails because '0' is not a label, because of unsortedness pandas cannot know how to handle this
In [35]: s.loc[0:]
...
KeyError: 0
I think it really be nice to have a better error message for this, as the difference between sorted/not-sorted can be very subtle, and with a sorted index, slicing with a non-present label is perfectly fine.
@jreback The only thing I don't directly find clear is why does the same example work with strings?
In [13]: ts['2016-1-1':]
Out[13]:
2016-10-30 01:00:00 1
2016-10-30 01:15:00 1
...
2016-10-30 03:45:00 1
2016-10-30 04:00:00 1
Freq: 15T, dtype: int64
Shouldn't this be equivalent to ts[pd.Timestamp('2016-1-1'):]
? Or at least in the case of ts['2016-1-1 00:00:00':]
(which also works) where it cannot be seen as partial string?
Comment From: jreback
this works with strings because a string turns into partial timestamp indexing and thus is a slice, and hence works.
so maybe we ought to always make truncate a slice, then this will just work. (internally its a scalar being passed and NOT a slice), so this is pretty easy to 'fix'.
and I agree that this should work.
So let's create 2 new issues for this
1) better error message with a datetime aware (e.g. it should show the converted key, not the internal repr)
2) fix .truncate
to always be slice like
Comment From: jorisvandenbossche
@jreback I thought partial strings only turned into a slice when the string has lower resolution than the series. In the case of ts['2016-1-1 00:00:00':]
this is certainly not the case, and should be treated as a slice with an exact Timestamp?
Comment From: jreback
In [7]: ts.index.resolution
Out[7]: 'minute'
so I think its always treated as a slice (in this example)
Comment From: DanielHabenicht
So this was never settled? I think I will create a new bug for it as it just cost me almost an hour.