Here's my code
import numpy as np
import pandas as pd
from numpy.random import randn
az = pd.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3']
, 'B' : ['b1', 'b2', 'b3', 'b4']
, 'Vals' : randn(4)
}).groupby(['A', 'B']).sum()
def add_zero(row) :
name = row.name
row.loc[name,'b0'] = 0
return row
az['Vals'].groupby(level='A').apply(lambda s : add_zero(s))
>>> A A B
>>> a1 a1 b1 1.238985
>>> b2 0.686438
>>> b0 0.000000
>>> a2 a2 b3 -0.503039
>>> b0 0.000000
>>> a3 a3 b4 -0.083150
>>> b0 0.000000
>>> Name: Vals, dtype: float64
Expected Output
>>> A B
>>> a1 b1 1.238985
>>> b2 0.686438
>>> b0 0.000000
>>> a2 b3 -0.503039
>>> b0 0.000000
>>> a3 b4 -0.083150
>>> b0 0.000000
>>> Name: Vals, dtype: float64
Problem description
I would like to add rows with new index in a grouped by Dataframe, it seems to work, yet it duplicates an index/level column, and I can't figure why it does. Is there any way around ?
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None
Comment From: chris-b1
Mutating in groupby
takes a complicated code path, to be honest not sure if this is a bug a not.
Easy workaround is to use concat
to combine the modified frames.
def add_zero(row, name):
row.loc[name, 'b0'] = 0
return row
pd.concat([add_zero(row, name)
for name, row in az['Vals'].groupby(level='A')])
Out[30]:
A B
a1 b1 0.006314
b2 -1.301589
b0 0.000000
a2 b3 -0.172631
b0 0.000000
a3 b4 -0.130576
b0 0.000000
Name: Vals, dtype: float64
Comment From: jreback
yeah, this isn't a bug, but a no-modification contract in groupby. Yes it sometimes works but is not guaranteed.