Sometimes, it is desired to know the percentage of true values in a Boolean column. Like when a DataFrame is composed of results from fitting many models, knowing up front the distribution of number of successful fits is helpful.
In [1]: df = pd.DataFrame({'c1' : list('abcdef'),
'c2' : [True, False, True, True, False, False],
'c3' : np.random.randn(6),
'c4' : np.arange(6)})
df.describe()
Out[1]:
c3 c4
count 6.000000 6.000000
mean 0.088425 2.500000
std 1.633860 1.870829
min -2.337804 0.000000
25% -0.973194 1.250000
50% 0.487471 2.500000
75% 1.231870 3.750000
max 1.873493 5.000000
Expected Output
In [2]: df['c2'] = df['c2'].astype('int64')
df.describe()
Out[2]:
c2 c3 c4
count 6.000000 6.000000 6.000000
mean 0.500000 0.088425 2.500000
std 0.547723 1.633860 1.870829
min 0.000000 -2.337804 0.000000
25% 0.000000 -0.973194 1.250000
50% 0.500000 0.487471 2.500000
75% 1.000000 1.231870 3.750000
max 1.000000 1.873493 5.000000
Comment From: jreback
This requires an explict inclusion, as bool is generally a categorical (and not a numeric).
In [4]: df.describe(include=['bool'])
Out[4]:
c2
count 6
unique 2
top True
freq 3
Doing this generically is very easy.
In [6]: f.select_dtypes(include=['bool']).astype('int').describe()
Out[6]:
c2
count 6.000000
mean 0.500000
std 0.547723
min 0.000000
25% 0.000000
50% 0.500000
75% 1.000000
max 1.000000