Pandas BUG: iterrows encounters OverflowError with str and datetime64[ns] columns

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
foo = pd.DataFrame({'some_string': ['612092d7-071f-467e832d-dd53e0f2b590-0006'], 'time': [pd.NaT]})
for _, row in foo.iterrows():
    pass

Problem description

The error OverflowError: Python int too large to convert to C long. I would expect the snippet to iterate just fine without any issue.

Expected Output

The snippet is not expected to show any output.

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : d9fff2792bf16178d4e450fe7384244e50635733
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.5.0
Version          : Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.0
numpy            : 1.19.1
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.2
setuptools       : 49.2.0.post20200712
Cython           : 0.29.21
pytest           : 6.0.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.17.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.0
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.2
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-7-d71373909082> in <module>
      1 foo = pd.DataFrame({'some_string': ['612092d7-071f-467e832d-dd53e0f2b590-0006'], 'time': [pd.NaT]})
----> 2 for _, row in foo.iterrows():
      3     pass

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/frame.py in iterrows(self)
   1011         klass = self._constructor_sliced
   1012         for k, v in zip(self.index, self.values):
-> 1013             s = klass(v, index=columns, name=k)
   1014             yield k, s
   1015 

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    325                     data = data.copy()
    326             else:
--> 327                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    328 
    329                 data = SingleBlockManager.from_array(data, index)

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    425         else:
    426             # we will try to copy be-definition here
--> 427             subarr = _try_cast(data, dtype, copy, raise_cast_failure)
    428 
    429     elif isinstance(data, ABCExtensionArray):

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/construction.py in _try_cast(arr, dtype, copy, raise_cast_failure)
    551             subarr = arr
    552         else:
--> 553             subarr = maybe_cast_to_datetime(arr, dtype)
    554 
    555         # Take care in creating object arrays (but iterators are not

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in maybe_cast_to_datetime(value, dtype, errors)
   1430             )
   1431         ):
-> 1432             value = maybe_infer_to_datetimelike(value)
   1433 
   1434     return value

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in maybe_infer_to_datetimelike(value, convert_dates)
   1290             # try timedelta first to avoid spurious datetime conversions
   1291             # e.g. '00:00:01' is a timedelta but technically is also a datetime
-> 1292             value = try_timedelta(v)
   1293             if lib.infer_dtype(value, skipna=False) in ["mixed"]:
   1294                 # cannot skip missing values, as NaT implies that the string

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in try_timedelta(v)
   1266 
   1267         try:
-> 1268             td_values = to_timedelta(v)
   1269         except ValueError:
   1270             return v.reshape(shape)

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/tools/timedeltas.py in to_timedelta(arg, unit, errors)
    110         arg = arg.item()
    111     elif is_list_like(arg) and getattr(arg, "ndim", 1) == 1:
--> 112         return _convert_listlike(arg, unit=unit, errors=errors)
    113     elif getattr(arg, "ndim", 1) > 1:
    114         raise TypeError(

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/tools/timedeltas.py in _convert_listlike(arg, unit, errors, name)
    149 
    150     try:
--> 151         value = sequence_to_td64ns(arg, unit=unit, errors=errors, copy=False)[0]
    152     except ValueError:
    153         if errors == "ignore":

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/arrays/timedeltas.py in sequence_to_td64ns(data, copy, unit, errors)
    926     if is_object_dtype(data.dtype) or is_string_dtype(data.dtype):
    927         # no need to make a copy, need to convert if string-dtyped
--> 928         data = objects_to_td64ns(data, unit=unit, errors=errors)
    929         copy = False
    930 

~/env/miniconda3/envs/data/lib/python3.8/site-packages/pandas/core/arrays/timedeltas.py in objects_to_td64ns(data, unit, errors)
   1036     values = np.array(data, dtype=np.object_, copy=False)
   1037 
-> 1038     result = array_to_timedelta64(values, unit=unit, errors=errors)
   1039     return result.view("timedelta64[ns]")
   1040 

pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.array_to_timedelta64()

pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.parse_timedelta_string()

pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.timedelta_from_spec()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.cast_from_unit()

OverflowError: Python int too large to convert to C long

Comment From: simonjayhawkins

Thanks @allentsouhuang for the report. This appears to be a regression from 0.25.3

Comment From: simonjayhawkins

i've labeled this as timedelta because of the traceback (expand details in OP)

Comment From: phofl

This was caused by #34634

This is actually quite interesting. The first string in '612092d7-071f-467e832d-dd53e0f2b590-0006' determines the frequency in https://github.com/pandas-dev/pandas/blob/020040b3b92516b445ddd8daba3b9818340e82d4/pandas/_libs/tslibs/timedeltas.pyx#L334

So when the result is to large for the inferred frequency we get the OverflowError.

For example

foo = pd.DataFrame({'some_string': ['2222Y3'], 'time': [pd.Timedelta("0 days 00:00:00.990000")]})
for _, row in foo.iterrows():
    print(row)

raises too. I am not quite sure, why we are even try to convert this to Timedelta? The previously specified unit (via the default) caused a ValueError in https://github.com/pandas-dev/pandas/blob/020040b3b92516b445ddd8daba3b9818340e82d4/pandas/_libs/tslibs/timedeltas.pyx#L248-L252 which is was caught by https://github.com/pandas-dev/pandas/blob/020040b3b92516b445ddd8daba3b9818340e82d4/pandas/core/dtypes/cast.py#L1382-L1385 and the array was just reshaped.

I am not quite sure what is the intended behavior here.

Comment From: mroeschke

Looks like this doesn't raise an OverflowError on master anymore. Supposed this could use a test

In [1]: import pandas as pd
   ...: foo = pd.DataFrame({'some_string': ['612092d7-071f-467e832d-dd53e0f2b590-0006'], 'time': [pd.NaT]})
   ...: for _, row in foo.iterrows():
   ...:     pass
   ...:

In [2]: foo
Out[2]:
                                some_string time
0  612092d7-071f-467e832d-dd53e0f2b590-0006  NaT

In [3]: row
Out[3]:
some_string    612092d7-071f-467e832d-dd53e0f2b590-0006
time                                                NaT
Name: 0, dtype: object

Comment From: simonjayhawkins

fixed in [8cf377199b1c94493666055c21012fd2f5fa86de] BUG: Timedelta(td64_out_of_bounds) silently overflowing (#38965)

Comment From: rxd330

take

Comment From: MarvinGravert

take

Pandas BUG: iterrows encounters OverflowError with str and datetime64[ns] columns

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`