Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
index = pd.date_range(start="2013-6-7T03:00",end="2013-6-7T05:00",freq="15T")
df2 = pd.Series(index=index, dtype=float)
df2.name = "09"  # commenting this line out avoids the bug
# very specific calculation from here 
# https://stackoverflow.com/questions/29007830/identifying-consecutive-nans-with-pandas
na_groups = df2.notna().cumsum().loc[df2.isna()]
lengths_consecutive_na = na_groups.groupby(na_groups).agg(len)

Issue Description

It seems as though the name of the Series in the example impacts the ability of the calculation to run — specifically if the name starts with a "0". The example above gives the error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-09 00:00:00

but commenting out the df2.name line leads to no error.

Expected Behavior

I wouldn't expect the name of the Series to matter for the calculation.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.8.16.final.0 python-bits : 64 OS : Darwin OS-release : 19.6.0 Version : Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.24.1 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 67.5.1 pip : 23.0.1 Cython : None pytest : 6.2.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.11.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : None brotli : fastparquet : 2023.2.0 fsspec : 2023.1.0 gcsfs : 2023.1.0 matplotlib : 3.4.1 numba : None numexpr : 2.8.3 odfpy : None openpyxl : None pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.6.2 snappy : None sqlalchemy : None tables : None tabulate : None xarray : 2023.1.0 xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: kthyng

Sorry I forgot another detail: There have to be nan's in the dataset for the error to crop up too, though that is the use case I am after. If all values are floats then no error. For example, this runs fine:

import pandas as pd
index = pd.date_range(start="2013-6-7T03:00",end="2013-6-7T05:00",freq="15T")
df2 = pd.Series(index=index, data=5, dtype=float)
df2.name = "09"  # commenting this line out avoids the bug
# very specific calculation from here 
# https://stackoverflow.com/questions/29007830/identifying-consecutive-nans-with-pandas
na_groups = df2.notna().cumsum().loc[df2.isna()]
lengths_consecutive_na = na_groups.groupby(na_groups).agg(len)

Comment From: jbrockmendel

Digging into this: in get_grouper we call is_in_obj(gpr) which returns gpr is obj[gpr.name]. It is that __getitem__ that is raising. In particular that lookup calls DatetimeIndex.get_loc, which parses "09" as datetime(1, 1, 9) and subsequently raises when trying to cast to Timestamp (via Period.to_timestamp)

There are a few places in there where we could/should avoid raising. Will need to give this some more thought.