Consider the simple example as follows
import pandas as pd
dataframe = pd.DataFrame({'time' : [pd.to_datetime('2016-06-06'),
pd.to_datetime('2016-06-06'),
pd.to_datetime('2016-06-07'),
pd.to_datetime('2016-06-08')],
'value' : [1,2,3,4],
'group' : ['A','B','A','B']})
dataframe.set_index('time', inplace = True)
dataframe
Out[13]:
group value
time
2016-06-06 A 1
2016-06-06 B 2
2016-06-07 A 3
2016-06-08 B 4
Now I want to use all the cool partial string indexing functions, but also be able to filter on other variables. The solution I come up with is the syntax
dataframe.loc[dataframe['group'] == 'A'].loc['2016-06-06']
which looks really horrible with the two loc
combined. Is that the correct pandonic way to do so (while keeping the index?) I have read and read again the documentation but I cannot find the answer.
Thanks!
Comment From: gfyoung
@randomgambit : Thanks for the question! I think you might be better served emailing this question out to the pandas-dev mailing list or reaching out on our Gitter channel.
I will close this issue for now unless there turns out to be a real issue (or enhancement) with the pandas
codebase that can be well addressed with the example you have provided.
Comment From: randomgambit
thanks @gfyoung but can you at least tell me if that's the correct syntax here? Its not easy to send en example via email, and there is nobody on gitter
Comment From: jreback
We don't directly have an exposed partial string indexer. Instead .get_loc
will do this, but it can return various things (e.g. a single index, array of booleans, or slice), so it is not directly user facing.
In [15]: dfi = dataframe.index
In [16]: dataframe.loc[(dataframe['group'] == 'A') & dfi.isin(dfi[dfi.get_loc('2016-06-06')])]
Out[16]:
group value
time
2016-06-06 A 1
Comment From: randomgambit
@jreback thanks a lot. I think this would be a really nice feature to have. partial string indexing is so useful, but 90% of the time I need to combine if with some other logical condition so I end up doing some df.reset_index().loc[]
which is super annoying.
When you say its not user facing, you mean I should not use it?
Comment From: jreback
i mean there is no direct way to do this aside from the verbose code that i posted
i suppose u could make an index function to wrap this as a convenience
if u want to post a signature / doc string can take a look
Comment From: gfyoung
Given where the discussion has gone, I'll reopen this.
Comment From: gfyoung
but can you at least tell me if that's the correct syntax here? Its not easy to send en example via email, and there is nobody on gitter
@randomgambit : Sorry about that! I was responding from my phone, so I was hoping email or Gitter could provide you with a more comprehensive response.
Comment From: randomgambit
@gfyoung @jreback I tried to find more conventional solutions and actually I found this funny thing:
dataframe.loc[(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06')) & (dataframe['group'] == 'A')]
FAILS
dataframe.reset_index(inplace = True)
dataframe.loc[(dataframe['time'] == pd.to_datetime('2016-06-06')) & (dataframe['group'] == 'A')]
WORKS
in purely logical terms, I dont see why 1. does not work here
Comment From: gfyoung
@randomgambit : What indices do you get from each call (i.e. compare what is being passed in-between the outermost brackets)?
Comment From: randomgambit
@gfyoung
dataframe.set_index('time', inplace = True)
dataframe.reset_index()['time']
Out[35]:
0 2016-06-06
1 2016-06-06
2 2016-06-07
3 2016-06-08
Name: time, dtype: datetime64[ns]
dataframe.reset_index(inplace = True)
dataframe['time']
Out[37]:
0 2016-06-06
1 2016-06-06
2 2016-06-07
3 2016-06-08
Name: time, dtype: datetime64[ns]
Comment From: gfyoung
And in your first example, what does the index look like? The DataFrame
in your two examples are not the same because one had its index reset (second example) but not the other (first example).
Comment From: randomgambit
@gfyoung I am sorry I dont understand what you want me to do here. I get that they are not the same, but I dont see why dataframe.loc[(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06')) & (dataframe['group'] == 'A')]
does not work
(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06'))
is simply a boolean test
Comment From: gfyoung
Right, but you get a Series
returned from that boolean test. The indices in that Series
won't match those of the dataframe
.
Comment From: randomgambit
@gfyoung got it, that works
dataframe.loc[(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06')).tolist() & (dataframe['group'] == 'A')]
Comment From: gfyoung
Awesome! Glad we could resolve that. Feel free to post a PR to wrap the functionality you shared above so that we don't have to write this out again :smile:
Comment From: mroeschke
Seems like the new API to wrap functionality hasn't gained much traction or interest over the years so going to close for now. Can reopen if there's more interest from the community in the future