Code Sample
def test_groupby_aggregate_item_by_item(self):
def test_df():
s = pd.DataFrame(np.array([[13, 14, 15, 16]]),
index=[0],
columns=['b', 'c', 'd', 'e'])
num = np.array([[s, s, s, datetime.strptime('2016-12-28', "%Y-%m-%d"), 'asdf', 24],
[s, s, s, datetime.strptime('2016-12-28', "%Y-%m-%d"), 'asdf', 6]])
columns = ['a', 'b', 'c', 'd', 'e', 'f']
idx = [x for x in xrange(0, len(num))]
return pd.DataFrame(num, index=idx, columns=columns)
c = [test_df().sort_values(['d', 'e', 'f']),
test_df().sort_values(['d', 'e', 'f'])]
df = pd.concat(c)
df = df[["e", "a"]].copy().reset_index(drop=True)
df["e_idx"] = df["e"]
what = [0, 0.5, 0.5, 1]
def x():
df.groupby(["e_idx", "e"])["a"].quantile(what)
self.assertRaisesRegexp(ValueError,
"'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'",
x)
Problem description
The return message from the ValueError in _GroupBy._aggregate_item_by_item is vague.
except (AttributeError):
> raise ValueError
E ValueError
core/groupby.py:592: ValueError
The proposed change raises the error message for the user to see.
Expected Output
except (AttributeError) as e:
> raise ValueError(e)
E ValueError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'
core/groupby.py:592: ValueError
Output of pd.show_versions()
Comment From: ghost
Possibly related: - https://github.com/pandas-dev/pandas/issues/5755
Comment From: jreback
this looks like a duplicate of #11759 ?
Comment From: ghost
merged that branch, testing again seems like the same issue occurs with a "ValueError 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'" as in the test that I made still passes.
Any suggestions?
Also, will do some cleanup on the test case, based on the reviews.
Thanks.
Comment From: jreback
@jmunsch you could have simply updated that original PR. but now that its closed GH won't let you re-open. so you can open a new one.
Comment From: ghost
So after doing some more research I tried moving _aggregate_item_by_item
into series, but then I notice some TypeErrors like this: https://github.com/pandas-dev/pandas/pull/8472
When trying:
> /Users/jmunsch/Desktop/dev/pandas/pandas/core/groupby.py(2631)_aggregate_item_by_item()
-> cannot_agg.append(item)
(Pdb) item
num1 num2
0 13 14
(Pdb) e
TypeError('Indexing a Series with DataFrame is not supported, use the appropriate DataFrame column',)
(Pdb)
I followed the linked issue 11579 more closely, and retried with:
def test_groupby_aggregate_item_by_item(self):
s = pd.DataFrame(np.array([[13, 14]]),
index=[0],
columns=['num1', 'num2'])
c = [pd.DataFrame([[s.copy(), 'zzz']], index=range(2), columns=['nums', 'node']),
pd.DataFrame([[s.copy(), 'xxx']], index=range(2), columns=['nums', 'node'])]
df = pd.concat(c)
df = df[["node", "nums"]].copy().reset_index(drop=True)
df["node_idx"] = df["node"]
y = df.set_index(["node_idx", "node"]).groupby("nums").quantile([0.25, 1])
But it returns an empty set, I must be using this incorrectly. closing.
Comment From: jreback
@jmunsch you are doing some really odd things in your example and seem to have embedded dataframes within dataframes.
please show the input data construction and what you are after.