Pandas groupby.var result is different than np.var

With the groupby.var, I don't have the same result with numpy. I use the variance formula, it seems that numpy is correct. In the groupby.var, I think that instead of dividing by N (the number of observations), it is divided by N-1.

# data
df = pd.DataFrame({'id':['aa','aa','aa','aa','dd','dd'],'a':[2,2,3,4,2,2], 'b':[1,2,3,4,5,6]})

# code pandas
df[['id','b']].groupby('id').var()

#[out]:
           b
id          
aa  1.666667
dd  0.500000

# with numpy
for i in df['id'].drop_duplicates().tolist():
    nb_var = np.var(df[df.id==i]['b'])
    print(i, nb_var)

#[out]:
aa 1.25
dd 0.25

Comment From: mroeschke

Specified in the docs, the Pandas variance calculation defaults to the unbiased estimate (n-1) while numpy uses the maximum likelihood calculation (n). If you want to calculate the variance like numpy, you'll need to specify ddof=0 in var()

In [5]: df[['id','b']].groupby('id').var(ddof=0)
Out[5]: 
       b
id      
aa  1.25
dd  0.25

Comment From: laurazh

It works. Thanks, I did not understand this option in the doc.