Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

pd.to_datetime("2017-09-18") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was correct as I expected exactly the output of the function

pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') -  My expectation was wrong as I expected Timestamp('2018-01-01 00:00:00')

Issue Description

Hello,

The documentation for offsets.YearBegin states "DateOffset increments between calendar year begin dates." I do not understand if my expectations about the behaviour of offsets.YearBegin are correct or not. Please see the code below:

pd.to_datetime("2017-09-18") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') - My expectation was correct as I expected exactly the output of the function

pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2017-01-01 00:00:00') -  My expectation was wrong as I expected Timestamp('2018-01-01 00:00:00')

Why if I subtract offsets.YearBegin() from the first day of the year I get the first day of the previous year? Is this expected? If so, I could improve the documentation. If not, I guess this could be a bug.

Thanks in advance.

Expected Behavior

pd.to_datetime("2018-01-01") - pd.offsets.YearBegin()
#> Timestamp('2018-01-01 00:00:00')

Installed Versions

pandas 1.4.4 Python 3.9.16

Comment From: MarcoGorelli

Thanks for the report,

I'd have expected - pd.offsets.YearBegin(n=0) to go to the beginning of the current year (just like how + pd.offsets.YearBegin(n=0)) goes to the end of the current year

+ pd.offsets.YearBegin() - pd.offsets.YearBegin() seems to work for what you're trying to do:

In [74]: to_datetime(["2017-09-18", "2018-01-01"]) + pd.offsets.YearBegin() - pd.offsets.YearBegin()
Out[74]: DatetimeIndex(['2017-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None)

Is this expected?

Not sure, I'll take a look this week


ps. Ciao Mirko!

Comment From: jbrockmendel

instead of addition/subtraction you want rollforward/rollback

Comment From: MarcoGorelli

sure, but that can't be vectorised, can it? I can do

to_datetime(["2017-09-18", "2018-01-01"]) + pd.offsets.YearBegin()

but

pd.offsets.YearBegin().rollbackward(to_datetime(["2017-09-18", "2018-01-01"]))

doesn't work, I'd need to do it element-by-element

Comment From: jbrockmendel

Yah we do need rollfoward/rollbackward for arrays, xref #7449. Also it would help for some of the recent issues with users trying to do dt64.astype("M8[Y]") expecting that to round down to the nearest YearBegin.

Comment From: MarcoGorelli

Right, the docs for DateOffset say

Zero presents a problem. Should it roll forward or back? We arbitrarily have it rollforward: date + BDay(0) == BDay.rollforward(date) Since 0 is a bit weird, we suggest avoiding its use.

So, we probably shouldn't recommend it in the docs.

From https://github.com/pandas-dev/pandas/issues/7449, it seems the simplest solution is

In [28]: to_datetime(["2017-09-18", "2018-01-01"]).to_period('Y').to_timestamp()
Out[28]: DatetimeIndex(['2017-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None)

If so, I could improve the documentation.

I think it'd be really good to include this example in the YearBegin docs (at least, until rollback and rollforward are available for arrays)

Comment From: MarcoGorelli

Re - the docs, it might be good to mirror the MonthBegin ones, which mention rollback https://pandas.pydata.org/docs/dev/reference/api/pandas.tseries.offsets.MonthBegin.html

Comment From: natmokval

Hi @mirkosavasta, do you want to do the PR for this issue? If you don't want to, I'll add examples toYearBegin docs. Could you let me know please?