Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import datetime as dt
import pandas as pd

pd.Timestamp(2020, 1, 1, 0, 0, tzinfo=dt.timezone.utc, fold=0)

Issue Description

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timestamps.pyx", line 1584, in pandas._libs.tslibs.timestamps.Timestamp.__new__
ValueError: Cannot pass fold with possibly unambiguous input: int, float, numpy.datetime64, str, or timezone-aware datetime-like. Pass naive datetime-like or build Timestamp from components.

Code works with Pandas 1.5. Possible Pandas 2.0 regression?

Search yielded #44378 except I do build from components.

When passing components as kwargs, it works:

pd.Timestamp(year=2020, month=1, day=1, hour=0, minute=0, tzinfo=dt.timezone.utc, fold=0)
Timestamp('2020-01-01 00:00:00+0000', tz='UTC')

I figured maybe I'm not using the constructor correctly but I couldn't find a mention in the docs preventing this usage.

The only test I found regarding this is this one:

https://github.com/pandas-dev/pandas/blob/ee84ef2753f396b6b9edb514dc2d1a72f78bd1ed/pandas/tests/scalar/timestamp/test_constructors.py#L839-L847

It is this test that led me to the "pass all as kwargs" solution.

Is this a bug or a known limitation?

Expected Behavior

Timestamp('2020-01-01 00:00:00+0000', tz='UTC')

Installed Versions

INSTALLED VERSIONS ------------------ commit : c2a7f1ae753737e589617ebaaff673070036d653 python : 3.9.2.final.0 python-bits : 64 OS : Linux OS-release : 5.10.0-21-amd64 Version : #1 SMP Debian 5.10.162-1 (2023-01-21) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : fr_FR.UTF-8 LOCALE : fr_FR.UTF-8 pandas : 2.0.0rc1 numpy : 1.24.2 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 65.3.0 pip : 23.0.1 Cython : None pytest : 7.2.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.9.5 jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : 2.0.4 tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : None qtpy : None pyqt5 : None

Comment From: mroeschke

From the original implementation in https://github.com/pandas-dev/pandas/commit/6e04264250f27787075dc0cb1685e20e7a6af071, I think the error message could have been clarified as components means from keyword only arguments

Comment From: lafrech

Perhaps. This could be a won'tfix, then. Still it is a bit surprising.

# 1/ This works, following calls are equivalent AFAIU
Timestamp(2019, 10, 27, 1, 30, tz=tz)
Timestamp(year=2019, month=10, day=27, hour=1, minute=30, tz=tz)

# 2/ This works
Timestamp(year=2019, month=10, day=27, hour=1, minute=30, tz=tz, fold=fold)

# 3/ This does not work
Timestamp(2019, 10, 27, 1, 30, tz=tz, fold=fold)

Why? Pandas understands the positional args, so how come 3/ is not equivalent to 2/? I may lack time/skills to dig into the code and understand why, but it is not intuitive to me.

Besides, it worked with Pandas 1.5.

In any case, the error message is misleading. And probably the docs too.

Comment From: mroeschke

@AlexKirko do you happen to remember when adding fold, were other Timestamp components supposed to be added as keyword only arguments when specifying fold?

I'm trying to determine if originally in https://github.com/pandas-dev/pandas/pull/31563 if 3/ was never supposed to work even in 1.5

Comment From: AlexKirko

Hello, this is what I can remember or could dig up:

In the examples that we documented then, we instruct the user to build from a naive datetime or keywords:

   pd.Timestamp(datetime.datetime(2019, 10, 27, 1, 30, 0, 0),
                tz='dateutil/Europe/London', fold=0)
   pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
                tz='dateutil/Europe/London', fold=1)

We never say that the user can't build from positional arguments.

Now, the code responsible for the check is here:

        if fold is not None:
            if fold not in [0, 1]:
                raise ValueError(
                    "Valid values for the fold argument are None, 0, or 1."
                )

            if (ts_input is not _no_input and not (
                    PyDateTime_Check(ts_input) and
                    getattr(ts_input, "tzinfo", None) is None)):
                raise ValueError(
                    "Cannot pass fold with possibly unambiguous input: int, "
                    "float, numpy.datetime64, str, or timezone-aware "
                    "datetime-like. Pass naive datetime-like or build "
                    "Timestamp from components."
                )

From what I remember and see in the code, we intended to raise an exception here only when we are trying to build from a non-ambiguous datetime-like input: one that passes PyDateTime_Check and has tzinfo (or does not pass PyDateTime_Check, I suppose).

I would say, the behavior in 2.0 is a bug. If we support the positional argument convention to mimic datetime, then passing the equivalent of a naive datetime through positional arguments should behave the same as a naive datetime.

My guess would be that something in 2.0 makes us pass a ts_input that is not _no_input in @lafrech 's example? Could try to chase this down during the weekend. @mroeschke , what do you think?

Comment From: lafrech

If we support the positional argument convention to mimic datetime, then passing the equivalent of a naive datetime through positional arguments should behave the same as a naive datetime.

Spot on. That would be consistent, indeed.

Could try to chase this down during the weekend.

Thanks @AlexKirko.

Comment From: AlexKirko

Hm, on 1.5.3 fold is actually None before this check (on main it's 0):

        # Allow fold only for unambiguous input
        if fold is not None:
            if fold not in [0, 1]:
                raise ValueError(
                    "Valid values for the fold argument are None, 0, or 1."
                )

Also, on 1.5.3 ts_input is a datetime.datetime object, and on main it's an int (just has the year in it). We then try to get a timezone from an int, and this is where we error out.

I'll go through the constructor code and see what changed.

Comment From: AlexKirko

Somehow we used to go into this if in this situation:

        if tzinfo is not None:
            # GH#17690 tzinfo must be a datetime.tzinfo object, ensured
            #  by the cython annotation.
            if tz is not None:
                if (is_integer_object(tz)
                    and is_integer_object(ts_input)
                    and is_integer_object(freq)
                ):
                    # GH#31929 e.g. Timestamp(2019, 3, 4, 5, 6, tzinfo=foo)
                    # TODO(GH#45307): this will still be fragile to
                    #  mixed-and-matched positional/keyword arguments
                    ts_input = datetime(
                        ts_input,
                        freq,
                        tz,
                        unit or 0,
                        year or 0,
                        month or 0,
                        day or 0,
                        fold=fold or 0,
                    )
                    nanosecond = hour
                    tz = tzinfo
                    print("auxilary constructor")
                    print(ts_input)
                    return cls(ts_input, nanosecond=nanosecond, tz=tz)

                raise ValueError('Can provide at most one of tz, tzinfo')

Now we no longer do, which looks correct, but where before we would pass a timezone-aware datetime without the fold argument, we now keep going and pass positional arguments.

Comment From: AlexKirko

Looks like the culprit is the predicate. We currently have:

            if (ts_input is not _no_input and not (
                    PyDateTime_Check(ts_input) and
                    getattr(ts_input, "tzinfo", None) is None)):

Which is equivalent to:

            if (ts_input is not _no_input and (
                    not PyDateTime_Check(ts_input) or
                    getattr(ts_input, "tzinfo", None) is not None)):

I think we should have:

            if (ts_input is not _no_input and
                    PyDateTime_Check(ts_input) and
                    getattr(ts_input, "tzinfo", None) is not None):

This fixes the OP's problem, let me see is this passes the test suite.

Comment From: AlexKirko

Of course not. Anyway, I'll find a predicate variant that does what we want and passes the tests.

Comment From: AlexKirko

Also, this looks like a kludge to me:

        elif is_integer_object(year):
            # User passed positional arguments:
            # Timestamp(year, month, day[, hour[, minute[, second[,
            # microsecond[, tzinfo]]]]])
            ts_input = datetime(ts_input, year, month, day or 0,
                                hour or 0, minute or 0, second or 0, fold=fold or 0)
            unit = None

In fact, ts_input is the year (which is what the datetime constructor expects).

print(ts_input)
print(_date_attributes)
> 2020
> [1, 1, 0, 0, None, None, None, None]

This should probably be carefully fixed in a separate PR, if it's not just me finding weird that month is assigned to the variable named year.