Series.describe()
seems to return the wrong dtype for series with categorical data. See example below:
import pandas as pd
series = pd.Series(pd.Categorical(values=[1], categories=[1, 2]))
print(series.describe())
# prints: ... dtype: int64
print(series.dtype)
# prints: category
This is at least confusing. Example has been produced using pandas 0.19.1 and Python 3.5.2.
Comment From: jreback
huh? describe is an integer value series, so this is correct.
In [2]: series.describe()
Out[2]:
count 1
unique 1
top 1
freq 1
dtype: int64
In [3]: series.dtype
Out[3]: category
Comment From: timtroendle
Ok, so I assume describe
returns the type that is wrapped in the Categorical
? Still that remains rather confusing to me. Here's an even shorter and even more confusing example:
series = pd.Series(dtype='category')
series.describe()
# ... dtype: int64
series.dtype
# category
Isn't the meaning of dtype
overloaded here, i.e. used in two different ways?
Comment From: jreback
dtype is the type of the result. This may be different from the input.
In [1]: s = Series([1,2,3])
In [2]: s
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: s.describe()
Out[3]:
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
In [4]: s = Series(list('abc'))
In [5]: s
Out[5]:
0 a
1 b
2 c
dtype: object
In [6]: s.describe()
Out[6]:
count 3
unique 3
top a
freq 1
dtype: object
In [7]: s = Series(list('abc')).astype('category')
In [8]: s.describe()
Out[8]:
count 3
unique 3
top c
freq 1
dtype: object
In [9]: s = Series([1,2,3]).astype('category')
In [10]: s.describe()
Out[10]:
count 3
unique 3
top 3
freq 1
dtype: int64
Comment From: timtroendle
Oh I wasn't aware of that. Probably because I was mostly dealing with float dtypes in the past. Sorry for the false positive bug report and thanks a lot for the detailed explanation @jreback !