IMO somewhat unexpected behavior depending on the arguments provided to shift after a groupby:

In [26]: ser = pd.Series(range(3), index=pd.date_range('1/1/18', periods=3))  

In [28]: ser.groupby([1,1,1]).shift()                                                                                                      
Out[28]: 
2018-01-01    NaN
2018-01-02    0.0
2018-01-03    1.0
Freq: D, dtype: float64

# No-op with freq keyword
In [34]: ser.groupby([1,1,1]).shift(0, freq='D')                                                                                           
Out[34]: 
2018-01-01    0
2018-01-02    1
2018-01-03    2
Freq: D, Name: 1, dtype: int64

# Suddenly inserted the grouper into the first level of a MI?
In [30]: ser.groupby([1,1,1]).shift(freq='D')                                                                                              
Out[30]: 
1  2018-01-02    0
   2018-01-03    1
   2018-01-04    2
Name: 1, dtype: int64

This is arguably in contrast to #22053 which is asking for the behavior of the last item consistently, but I think that is the one that is actually incorrect out of all examples and causes misalignment with the index of the original caller.

cc @SimonAlecks who originally found this in #21235

Expected Output

In [30]: ser.groupby([1,1,1]).shift(freq='D')                                                                                              
Out[34]: 
2018-01-01    NaN
2018-01-02    0.0
2018-01-03    1.0
Freq: D, dtype: float64

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: 1f1f7053d7c7f88c432c400a809d0396a1baa3ff python: 3.6.7.final.0 python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.24.0.dev0+1123.g1f1f7053d.dirty pytest: 4.0.0 pip: 18.1 setuptools: 40.6.2 Cython: 0.29 numpy: 1.15.4 scipy: 1.1.0 pyarrow: 0.11.1 xarray: 0.11.0 IPython: 7.1.1 sphinx: 1.8.2 patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.8 feather: None matplotlib: 3.0.1 openpyxl: 2.5.9 xlrd: 1.1.0 xlwt: 1.2.0 xlsxwriter: 1.1.2 lxml: 4.2.5 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.14 pymysql: 0.9.2 psycopg2: None jinja2: 2.10 s3fs: 0.1.6 fastparquet: 0.1.6 pandas_gbq: None pandas_datareader: None gcsfs: 0.2.0

Comment From: simonariddell

So @WillAyd here is what I can tell so far,

https://github.com/pandas-dev/pandas/blob/d43ac979fdf3038c713de93c7b27b4140be6c9c5/pandas/core/groupby/groupby.py#L705-L707 In this method mutated returns as True in the case where the index is shifted, which then passes into line 712 as not_indexed_same. In this case keys is the same value for both the case where it has expected and unexpected benefits ``` python: keys

Int64Index([1], dtype='int64')

One question is, is this expected behavior here? 

This then passes here:
https://github.com/pandas-dev/pandas/blob/d43ac979fdf3038c713de93c7b27b4140be6c9c5/pandas/core/groupby/groupby.py#L886-L888
If this block is hit, it's good.

However, the problem we see is here:
https://github.com/pandas-dev/pandas/blob/d43ac979fdf3038c713de93c7b27b4140be6c9c5/pandas/core/groupby/groupby.py#L906-L918
If we debug into it we can see what is happening
``` python
group_keys
> Int64Index([1], dtype='int64')

So this seems to be working as expected, it's just being given group_keys that don't really make sense.

In fact, we can generalize this problem a little by noticing it's 'unrelated' to the shift function so to speak. ``` python: gb = ser.groupby([1, 1, 1])

gb.shift(freq='D')

def _mutate(x): return x.reindex(x.index[0:2]) gb.apply(_mutate)

1 2018-01-01 0 2018-01-02 1 Name: 1, dtype: int64 ``` Essentially any groupby operation that mutates the dataframe induces this strange index behavior.

However, what is interesting is that in all cases the group_keys are wrong (I'm assuming Int64Index([1], dtype='int64') is wrong). However, it's only the cases where the data is also mutated that it hits the conditional logic on groupby.py 906 (as included above), where this wrong key gives us unexpected output.

This originates from the call in the apply method in ops.py, specifically here: https://github.com/pandas-dev/pandas/blob/d43ac979fdf3038c713de93c7b27b4140be6c9c5/pandas/core/groupby/ops.py#L152-L154

Where self.levels[0] returns Int64Index([1], dtype='int64'.

Anyway, I probably need to read more unit tests to understand the expected behavior and when and why having conditional logic for mutated indices is necessary, and when and why the group_keys works as intended. I'll continue looking into this and add updates.

https://github.com/pandas-dev/pandas/blob/d43ac979fdf3038c713de93c7b27b4140be6c9c5/pandas/core/groupby/ops.py#L164-L167

Comment From: jbrockmendel

@WillAyd on main i'm seeing a result that looks reasonable but also doesn't match your Expected. can you take a look

There's also a test_pct_change with an xfail that points back to this issue that looks like the current (raising) behavior is correct. any idea whats going on there?

Pandas Shifting GroupBy Object with freq inserts unwanted index level

Expected Output

Output of pd.show_versions()

gb.shift(freq='D')

Output of `pd.show_versions()`