Hi,

Coming back to issue #2665, I'm still not convinced that the closed-argument of the resampling function does what it should.

When resampling daily to monthly values with closed = 'left', the closed argument should relate to the actual time the months change, but instead it seems to refer to the time 24 hours earlier.

In [25]: import pandas as pd
    ...: import numpy as np
    ...: dates = pd.date_range('3/29/2000', '4/3/2000', freq='1D')

In [27]: ts1 = pd.Series(np.ones(len(dates)), index=dates)

In [28]: ts2 = ts1.resample('1M', closed='left',label='left').sum()

In [29]: ts3 = ts1.resample('1M').sum()

In [30]: ts1
Out[30]:
2000-03-29    1.0
2000-03-30    1.0
2000-03-31    1.0
2000-04-01    1.0
2000-04-02    1.0
2000-04-03    1.0
Freq: D, dtype: float64

# resampling and asking that the first sample of the month (midnight April 1st) is part of that month does not result in what you think. Instead pandas does not consider 01/04/2000 00:00 as the time the months change, but it thinks the months change at 31/03/2000 00:00 (one day too early)
In [31]: ts2
Out[31]:
2000-02-29    2.0
2000-03-31    4.0
Freq: M, dtype: float64


# keeping it simple is a possible solution but it would be better to have the closed-argument working correctly ...

In [32]: ts3
Out[32]:
2000-03-31    3.0
2000-04-30    3.0
Freq: M, dtype: float64

Comment From: jtratner

Can you post the output you expect compared to what you actually get? Makes it easier to follow and also to write better test cases to make sure the problem is resolved.

Comment From: cancan101

Might be worth looking at this issue as well: #5023

Comment From: kdebrab

I think the expected behavior is:

>>> ts1.resample('1M', how='sum', closed='left',label='right')
2000-03-31    3
2000-04-30    3
Freq: M, dtype: float64
>>> ts1.resample('1M', how='sum', closed='right',label='right')
2000-03-31    4
2000-04-30    2
Freq: M, dtype: float64

Of course, in that case the default behavior should become closed='left'.

See also my comment on issue #2665.

Comment From: MarcoGorelli

This looks correct to me

You have

In [5]: ts1
Out[5]: 
2000-03-29    1.0
2000-03-30    1.0
2000-03-31    1.0
2000-04-01    1.0
2000-04-02    1.0
2000-04-03    1.0

If you do resample('1M', closed='left', label='left'), then you end up with the following groups: - 2000-02-29: [2000-02-29, 2000-03-31) - 2000-03-31: [2000-03-31, 2000-04-30)

If you look at the values in each group you get: - [1, 1] - [1, 1, 1, 1] so, after the sum aggregation, you get:

- 2000-02-29: 2.0
- 2000-03-31: 4.0

As documented here, 'M' means "month end". So the choice of bins looks correct as well.

I think the OP's phrase

Instead pandas does not consider 01/04/2000 00:00 as the time the months change

suggests they misunderstood what 'M' does, and that perhaps they meant 'MS', which gives

In [9]: ts1.resample('1MS', closed='left', label='left').sum()
Out[9]: 
2000-03-01    3.0
2000-04-01    3.0
Freq: MS, dtype: float64

Closing then - please do let me know if I've misunderstood and I'll reopen

Comment From: jorisvandenbossche

I think you are technically speaking right that this is "correct", but it is also very confusing ..

I suppose the main confusing part is that the month end/start labels are expressed in full days, and thus this is the start of the last day of the month. So when using closed="left", also the hours in the middle of the last day of a previous month are considered part of the next month, and not only the exact separating instant between both months (i.e. 2000-03-31 23:59:59.999.. / 2000-04-01 00:00:00).

This contrasts to how for example the Day offset works, where the division between both periods is the exact instant. I suppose for months this was done to have it play nicer with daily frequencies and because we want to show the month value in the resulting DatetimeIndex as a date (i.e. "2000-03-31" and not "2000-03-31 23.59.59.999" as the month end value)

I think this is also basically what Stephan explains in https://github.com/pandas-dev/pandas/issues/9586

Given the way that the MonthEnd frequency is defined as the start of the last day, I doubt that it ever makes sense to use this with closed="left". To avoid this confusion, maybe we could raise a warning/error when the user does this, pointing them to use freq="MS" (MonthStart) instead of MonthEnd?

Comment From: MarcoGorelli

I doubt that it ever makes sense to use this with closed="left"

Yes, I was thinking that - agree on raising (or at least warning), I'll open a new issue

Comment From: jorisvandenbossche

I think we can use https://github.com/pandas-dev/pandas/issues/9586 for that, since it's basically closing the confusion that Stephan describes there?

Comment From: MarcoGorelli

such old issues tend to not get any attention/activity, and the example doesn't run anymore as it's Python2 and a really old version of pandas

furthermore, that issue sounds like it's more about renaming a confusing freq?

I suppose we could add ME, etc., for month end, but changing the offset M from month-end to month-start seems like a non-starter. Ugh. So I guess we're left with a documentation issue (https://github.com/pandas-dev/pandas/issues/5023), unless we want to add a hack for resample.

I've opened https://github.com/pandas-dev/pandas/issues/51710 as warning/erroring is a different activity

I'll have a think, but I agree something should be done as this is really confusing