Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
pd.to_datetime("2017-09-18") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was correct as I expected exactly the output of the function
pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was wrong as I expected Timestamp('2018-01-01 00:00:00')
Issue Description
Hello,
The documentation for offsets.YearBegin
states "DateOffset increments between calendar year begin dates."
I do not understand if my expectations about the behaviour of offsets.YearBegin
are correct or not. Please see the code below:
pd.to_datetime("2017-09-18") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was correct as I expected exactly the output of the function
pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was wrong as I expected Timestamp('2018-01-01 00:00:00')
Why if I subtract offsets.YearBegin()
from the first day of the year I get the first day of the previous year? Is this expected?
If so, I could improve the documentation. If not, I guess this could be a bug.
Thanks in advance.
Expected Behavior
pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2018-01-01 00:00:00')
Installed Versions
pandas 1.4.4 Python 3.9.16
Comment From: MarcoGorelli
Thanks for the report,
I'd have expected - pd.offsets.YearBegin(n=0)
to go to the beginning of the current year (just like how + pd.offsets.YearBegin(n=0)
) goes to the end of the current year
+ pd.offsets.YearBegin() - pd.offsets.YearBegin()
seems to work for what you're trying to do:
In [74]: to_datetime(["2017-09-18", "2018-01-01"]) + pd.offsets.YearBegin() - pd.offsets.YearBegin()
Out[74]: DatetimeIndex(['2017-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None)
Is this expected?
Not sure, I'll take a look this week
ps. Ciao Mirko!
Comment From: jbrockmendel
instead of addition/subtraction you want rollforward/rollback
Comment From: MarcoGorelli
sure, but that can't be vectorised, can it? I can do
to_datetime(["2017-09-18", "2018-01-01"]) + pd.offsets.YearBegin()
but
pd.offsets.YearBegin().rollbackward(to_datetime(["2017-09-18", "2018-01-01"]))
doesn't work, I'd need to do it element-by-element
Comment From: jbrockmendel
Yah we do need rollfoward/rollbackward for arrays, xref #7449. Also it would help for some of the recent issues with users trying to do dt64.astype("M8[Y]")
expecting that to round down to the nearest YearBegin.
Comment From: MarcoGorelli
Right, the docs for DateOffset say
Zero presents a problem. Should it roll forward or back? We arbitrarily have it rollforward: date + BDay(0) == BDay.rollforward(date) Since 0 is a bit weird, we suggest avoiding its use.
So, we probably shouldn't recommend it in the docs.
From https://github.com/pandas-dev/pandas/issues/7449, it seems the simplest solution is
In [28]: to_datetime(["2017-09-18", "2018-01-01"]).to_period('Y').to_timestamp()
Out[28]: DatetimeIndex(['2017-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None)
If so, I could improve the documentation.
I think it'd be really good to include this example in the YearBegin
docs (at least, until rollback
and rollforward
are available for arrays)
Comment From: MarcoGorelli
Re - the docs, it might be good to mirror the MonthBegin
ones, which mention rollback
https://pandas.pydata.org/docs/dev/reference/api/pandas.tseries.offsets.MonthBegin.html
Comment From: natmokval
Hi @mirkosavasta, do you want to do the PR for this issue? If you don't want to, I'll add examples toYearBegin
docs. Could you let me know please?