Code Sample
import pandas as pd
df = pd.DataFrame({
'a': [1, 1, 2, 3],
'b': [None, 1, 2, 3],
'c': [1,2,3,4],
})
grouped = df.groupby(('a', 'b'))
print('Number of groups:', len(grouped))
# Number of groups: 4
num_iterations = 0
for _ in grouped:
num_iterations += 1
print('Number of groups iterated over:', num_iterations)
# Number of groups iterated over: 3
Problem description
Presently, when a GroupBy
object is iterated over, any group where one of the columns grouped by is None
is skipped.
This is a problem because when iterating over groups, we expect to iterate over all groups.
It is also a problem because it is not sensible to say the length of an iterable is x
when iterating over it only performs some y < x
number of iterations.
Expected Output
I'd expect iterating over a GroupBy
object to iterate over all groups, regardless of the value in the key columns.
Output of pd.show_versions()
Comment From: chris-b1
Thanks for the report, this is a duplicate of #3729, with a WIP fix at #12607 - please try it out and comment if you'd like.
Comment From: nsfinkelstein
@chris-b1 Thanks - apologies for the duplication, I didn't notice that post when I was looking through issues.