Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

i = pd.date_range('2018-04-09', periods=4, freq='2D')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
# Prints the times from the first 3 days (so first two rows)
print(ts.first('3D'))
# Prints empty dataframe
print(ts.iloc[::-1].first('3D'))

# Prints the times from the last 3 days (so last two rows)
print(ts.last('3D'))
# Prints empty dataframe
print(ts.iloc[::-1].last('3D'))

Issue Description

When I have ts ordered by increasing date, first('3D') selects the two rows that are within 3 days, inclusive, of the earliest date, 2018-04-09. If I call first('3D') on ts reversed, I get no results. Same applies mutatis mutandis to last.

Expected Behavior

first('3D') should give me the two rows that are within 3 days, inclusive, of the earliest date, 2018-04-09, so the rows with 2018-04-09 and 2018-04-11. The description of first() suggests that it gives the first dates relative to the temporal first date in the index. Same applies mutatis mutandis to last.

Installed Versions

/Users/maheshvashishtha/opt/anaconda3/envs/pandas-dev-alt/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.8.16.final.0 python-bits : 64 OS : Darwin OS-release : 21.5.0 Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.24.1 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 65.6.3 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.8.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: dicristina

df.first [df.last] takes the first [last] item of the index and adds [subtracts] the offset to get the end [start] timestamp then it searches the index (with Index.searchsorted) for the offset of the end [start] value and finally slices the DataFrame. This works only if the index is sorted in ascending order but the documentation does not state this.

Comment From: dicristina

df.first gives different result for MonthBegin and MonthEnd:

In [1]: import pandas as pd

In [2]: s = pd.Series(1, index=pd.bdate_range("2010-03-31", periods=100))

In [3]: print(s.first("1M"))   # As expected
2010-03-31    1
Freq: B, dtype: int64

In [4]: print(s.first("1MS"))
2010-03-31    1
2010-04-01    1
Freq: B, dtype: int64

I think it would be best to rewrite it so that the results always match the groupings given by df.resample.

Comment From: MarcoGorelli

thanks for the report

this function is being deprecated, see https://github.com/pandas-dev/pandas/issues/45908#issuecomment-1525822630, so let's close for now