In [1]: import pandas as pd
...: import numpy as np
...:
In [2]: pd.__version__
Out[2]: '0.20.3'
In [3]:
In [3]: a = pd.DataFrame( np.random.random(100).reshape(10,10) )
In [4]: a.iloc[:,0] = "a_str"
In [5]: a.iloc[3:10,0] = "b_str"
In [6]: def f(group):
...: print( type( group) )
...: print( group.shape )
...: print( "done this function" )
...:
In [7]: a.groupby(0 , axis = 0).apply(f)
<class 'pandas.core.frame.DataFrame'>
(3, 10)
done this function
<class 'pandas.core.frame.DataFrame'>
(3, 10)
done this function
<class 'pandas.core.frame.DataFrame'>
(7, 10)
done this function
Out[7]:
Empty DataFrame
Columns: []
Index: []
In [8]: a.iloc[:,0]
Out[8]:
0 a_str
1 a_str
2 a_str
3 b_str
4 b_str
5 b_str
6 b_str
7 b_str
8 b_str
9 b_str
Name: 0, dtype: object
In [9]: set(a.iloc[:,0].values)
Out[9]: {'a_str', 'b_str'}
Problem description
the DataFrame is grouped by the first coloum, and then it applied a function. The first apply will perform twice. Is it a bug or something wrong with my code?
I have also tried different str named column it will do it twice as well.
Best wishes, Yilang.
Comment From: jreback
see the big read box:http://pandas.pydata.org/pandas-docs/stable/groupby.html#flexible-apply this is an implementation detail.
Comment From: cyrushu
Thanks a lot~