• [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import datetime, pytz

# t is a date after 2262 that should be coerced into NaT
t = datetime.datetime(3000, 1, 1, 0, 0, 0, 0, pytz.UTC)

# lists with many such dates
l50 = [t] * 50
l51 = [t] * 51

# pd.to_datetime crashes if the list is longer that 50

# this is fine
print(pd.to_datetime(l50, utc=True, errors='coerce'))  

# this crashes
print(pd.to_datetime(l51, utc=True, errors='coerce'))

Issue Description

Background (irrelevant for reproducing the bug, but good to know): A google bigquery gives me dataframe with a column with datetime values with year=9999, and I store the result as a parquet file. When I read it back, read_parquet crashes. I traced it down to the following problem in pd.to_datetime.

The issue: if pd.datetime needs to convert more than 50 datetime values with tzinfo set, and those values are above year 2262 (the maximal date for pd.Timestamp) and should be coerced into NaT, then pd.to_datetime crashes with a raised exception. It works fine if the series is smaller than 51 items.

Expected Behavior

The expected behavior is that pd.to_datetime should return a list of NaT values also for Sequences longer than 50 items.

Installed Versions

------------------ commit : 73c68257545b5f8530b7044f56647bd2db92e2ba python : 3.7.10.final.0 python-bits : 64 OS : Linux OS-release : 4.19.0-17-cloud-amd64 Version : #1 SMP Debian 4.19.194-3 (2021-07-18) machine : x86_64 processor : byteorder : little LC_ALL : en_US.UTF-8 LANG : C.UTF-8 LOCALE : None.None pandas : 1.3.3 numpy : 1.19.5 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.3 IPython : 7.27.0 pandas_datareader: None bs4 : None bottleneck : 1.3.2 fsspec : 2021.08.1 fastparquet : None gcsfs : 2021.08.1 matplotlib : 3.4.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 5.0.0 pyxlsb : None s3fs : None scipy : 1.7.1 sqlalchemy : 1.4.25 tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.54.0

Comment From: jreback

hmm try this on master not sure if we back ported this patch

Comment From: debnathshoham

this is still failing on master

Comment From: larsr

The exception is raised on line 467 in array_to_datetime in tslib.pyx

It raises the exception because (four lines above) the value of utc_convert is False.

The value is False because the call to objects_to_datetime64ns on line 2066 does not fill in a value for the utc parameter (which has default value False).

So when objects_to_datetime later calls array_to_datetime it uses utc=False.

Comment From: thomasqueirozb

take

Comment From: srotondo

I believe that this is a duplicated issue that was fixed during the closing of #45319, so this issue can be closed, right?

Comment From: debnathshoham

this works fine on master now

Comment From: phofl

agreed, this was fixed and tested in the other issue