Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
index = pd.date_range(start="2013-6-7T03:00",end="2013-6-7T05:00",freq="15T")
df2 = pd.Series(index=index, dtype=float)
df2.name = "09" # commenting this line out avoids the bug
# very specific calculation from here
# https://stackoverflow.com/questions/29007830/identifying-consecutive-nans-with-pandas
na_groups = df2.notna().cumsum().loc[df2.isna()]
lengths_consecutive_na = na_groups.groupby(na_groups).agg(len)
Issue Description
It seems as though the name of the Series in the example impacts the ability of the calculation to run — specifically if the name starts with a "0". The example above gives the error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-09 00:00:00
but commenting out the df2.name
line leads to no error.
Expected Behavior
I wouldn't expect the name of the Series to matter for the calculation.
Installed Versions
Comment From: kthyng
Sorry I forgot another detail: There have to be nan's in the dataset for the error to crop up too, though that is the use case I am after. If all values are floats then no error. For example, this runs fine:
import pandas as pd
index = pd.date_range(start="2013-6-7T03:00",end="2013-6-7T05:00",freq="15T")
df2 = pd.Series(index=index, data=5, dtype=float)
df2.name = "09" # commenting this line out avoids the bug
# very specific calculation from here
# https://stackoverflow.com/questions/29007830/identifying-consecutive-nans-with-pandas
na_groups = df2.notna().cumsum().loc[df2.isna()]
lengths_consecutive_na = na_groups.groupby(na_groups).agg(len)
Comment From: jbrockmendel
Digging into this: in get_grouper we call is_in_obj(gpr)
which returns gpr is obj[gpr.name]
. It is that __getitem__
that is raising. In particular that lookup calls DatetimeIndex.get_loc
, which parses "09" as datetime(1, 1, 9)
and subsequently raises when trying to cast to Timestamp (via Period.to_timestamp)
There are a few places in there where we could/should avoid raising. Will need to give this some more thought.