Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
i = pd.date_range('2018-04-09', periods=4, freq='2D')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
# Prints the times from the first 3 days (so first two rows)
print(ts.first('3D'))
# Prints empty dataframe
print(ts.iloc[::-1].first('3D'))
# Prints the times from the last 3 days (so last two rows)
print(ts.last('3D'))
# Prints empty dataframe
print(ts.iloc[::-1].last('3D'))
Issue Description
When I have ts
ordered by increasing date, first('3D')
selects the two rows that are within 3 days, inclusive, of the earliest date, 2018-04-09
. If I call first('3D')
on ts
reversed, I get no results. Same applies mutatis mutandis to last
.
Expected Behavior
first('3D')
should give me the two rows that are within 3 days, inclusive, of the earliest date, 2018-04-09
, so the rows with 2018-04-09
and 2018-04-11
. The description of first()
suggests that it gives the first dates relative to the temporal first date in the index. Same applies mutatis mutandis to last
.
Installed Versions
Comment From: dicristina
df.first
[df.last
] takes the first [last] item of the index and adds [subtracts] the offset to get the end [start] timestamp then it searches the index (with Index.searchsorted
) for the offset of the end [start] value and finally slices the DataFrame. This works only if the index is sorted in ascending order but the documentation does not state this.
Comment From: dicristina
df.first
gives different result for MonthBegin
and MonthEnd
:
In [1]: import pandas as pd
In [2]: s = pd.Series(1, index=pd.bdate_range("2010-03-31", periods=100))
In [3]: print(s.first("1M")) # As expected
2010-03-31 1
Freq: B, dtype: int64
In [4]: print(s.first("1MS"))
2010-03-31 1
2010-04-01 1
Freq: B, dtype: int64
I think it would be best to rewrite it so that the results always match the groupings given by df.resample
.
Comment From: MarcoGorelli
thanks for the report
this function is being deprecated, see https://github.com/pandas-dev/pandas/issues/45908#issuecomment-1525822630, so let's close for now