Pandas API/BUG: type of scalar aggregations

Currently there is some inconsistency around the scalar type returned from a Series aggregation, both in terms of whether it is a numpy or python type, as well as different behavior for an empty Series - see table below.

Normally this isn't a big deal as the numpy types mostly behave like the python type, but can be an issue with serialization, which is where I ran into this.

Is the desired behavior to make all of these python types?

function	type	type_empty
sum	<class 'float'>	<class 'int'>
mean	<class 'float'>	<class 'float'>
median	<class 'float'>	<class 'float'>
count	<class 'numpy.int32'>	<class 'numpy.int32'>
var	<class 'float'>	<class 'float'>
std	<class 'float'>	<class 'float'>
sem	<class 'numpy.float64'>	<class 'numpy.float64'>
nunique	<class 'int'>	<class 'int'>
prod	<class 'numpy.float64'>	<class 'float'>
min	<class 'numpy.float64'>	<class 'float'>
max	<class 'numpy.float64'>	<class 'float'>

Code for table

fxs = ['sum', 'mean', 'median', 'count', 'var', 'std', 'sem',  'nunique', 'prod', 'min', 'max']
s = pd.Series([1., 2., 3.])
s_empty = pd.Series([], dtype='f8')
data = []

for f in fxs:
    row = dict(function=f)
    res = getattr(s, f)()
    row['type'] = type(res)
    res = getattr(s_empty, f)()
    row['type_empty'] = type(res)
    data.append(row)

pd.DataFrame(data).to_html(index=False)

Comment From: jreback

yes this would be good to make consistent. scalars should always be python scalars; numpy scalars are a weird hybrid that can have odd effects.

Comment From: chris-b1

Extra test case from #19381

import pandas as pd
import json
import datetime

data = [
    datetime.date(1987, 2, 12),
    datetime.date(1987, 2, 12),
    datetime.date(1987, 2, 12),
    None,
    None,
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15),
    datetime.date(1989, 6, 15)
]

df = pd.DataFrame(columns=['foo'])
df['foo'] = data
ds = df['foo'].describe()
d = ds.to_dict()
j = json.dumps(d)
print(j)

Comment From: torlenor

This is still an issue and leads to troubles, as the original poster described, in serialization when packages are used which do not know how to handle numpy types.

Comment From: jreback

@torlenor pull requests for tests and patches are always welcome and how things get fixed