Pandas Series iteration and to_dict methods *sometimes* return underlying storage type vs. Python object

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

s1 = pd.Series({"a": np.int64(64), "b": 10})
for v in s1.to_dict().values():
    print(type(v))  # prints <class 'int'> 2x

s2 = pd.Series({"a": np.int64(64), "b": 10, "c": "ABC"})
for v in s2.to_dict().values():
    print(type(v))  # prints <class 'numpy.int64'> for first variable "a"

for k, v in s1.items():
    print(k, type(v))  # prints <class 'int'> 2x

for k, v in s2.items():
    print(k, type(v))  # prints <class 'numpy.int64'> again for the first variable "a"

Problem description

pd.Series.to_dict can return different types for objects depending on the composition of the series. This also affects iteration, e.g., for k, v in series: .... This is inconsistent and, critically, leads to really weird and hard to debug issues downstream with types, especially around JSON conversion (the built-in json module and many others will blow up when it encounters numpy dtypes).

I cannot find this exact issue open in the issue tracker, though there are a number of related issues including: * An issue related to DataFrame.to_dict and inconsistent types (closed in 0.24): https://github.com/pandas-dev/pandas/issues/24908 * This issue also related to scalar coercion on DataFrame.to_dict calls (also closed recently): https://github.com/pandas-dev/pandas/issues/23753 * This PR fixes the issue deriving from iteration, but it looks like the above case is either an untested edge case or a regression: https://github.com/pandas-dev/pandas/pull/17491

Expected Output

Expected output is for type coercion to Python ints to occur regardless of the exact column composition in the Series. https://github.com/pandas-dev/pandas/issues/24908 is a related issue for DataFrame coercions with irregular behavior happening as a result.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Comment From: mroeschke

This probably occurs because s2 is object dtype and it's trying to preserve the dtype of each input argument while the arguments in s1 can both be coerced to int64.

Investigation and PR's welcome~

Comment From: drew-heenan

I'm having a go at this issue - quick note @boydgreenfield, it looks like iterating over a Series object as in the last two loops in your example results in an iteration only over the int values in the Series. Did you mean to iterate over s1.items() or similar?

Comment From: boydgreenfield

@drew-heenan Yes you're right I meant .items(). Have updated the above code snippet. Thanks for taking a look at the issue!

Comment From: simonjayhawkins

from https://github.com/pandas-dev/pandas/pull/37648#issue-516125361

This resolves the issue of return types from to_dict. #25969 also discusses return types from .items(), which relates to an outstanding NumPy issue numpy/numpy#14139, and I don't address that part here atm

Comment From: ghost

Some findings about the root cause of this casting issue on Series.items(): https://github.com/pandas-dev/pandas/issues/50125#issuecomment-1342886489

Pandas Series iteration and to_dict methods *sometimes* return underlying storage type vs. Python object

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Pandas Series iteration and to_dict methods sometimes return underlying storage type vs. Python object

Output of `pd.show_versions()`