Pandas API: require tz equality or just tzawareness-compat in setitem-like methods?

There are a handful of methods that do casting/validation in DatetimeArray (and indirectly, Series and DatetimeIndex) methods:

__setitem__ shift insert fillna take searchsorted __cmp__

In all cases, we upcast datetime64/datetime to Timestamp, try to parse strings, and check for tzawareness compat (i.e. raise if (self.tz is None) ^ (other.tz is None))

In the cases of searchsorted and cmp, we stop there, and don't require the actual timezones to match. In the other cases, we are stricter and raise if not tz_compare(self.tz, other.tz)

In #37299 we discussed the idea of making these uniform. I am generally positive on this, as it would make things simpler both for users and in the code. cc @jorisvandenbossche @jreback

Side-notes 1) __sub__ doesn't do the same casting, but also requires tz match but could get by with only tzawareness compat, xref #37329. BTW the stdlib datetime only requires tzawareness compat.

2) In Series/DataFrame.__setitem__ we cast to object instead of raising on tz mismatch. For brevity I lump this in with "we raise on tz mismatch".

Comment From: jorisvandenbossche

Personally, I would go for "tz-awareness compat" requirement for all cases, I think. I don't think there are really "ambiguous" cases about what the result would be if all involved are tz-aware? (which if of course not the same with mixed tz-awareness)

Comment From: jbrockmendel

im on board with going all-tzawareness-compat. Assuming we get consensus on that, should we do a deprecation cycle or Just Do It?

Comment From: jreback

yeah i think should deprecate now

+1 in strict

Comment From: jbrockmendel

@jreback to be clear, joris and I are advocating for the non-strict version

Comment From: jbrockmendel

@TomAugspurger @simonjayhawkins @WillAyd @mroeschke any preferences here?

Comment From: mroeschke

If the non-strict route was chosen, what would be the resulting timezone after these operations?

Comment From: jbrockmendel

If the non-strict route was chosen, what would be the resulting timezone after these operations?

self.tz would be retained

Comment From: mroeschke

I guess either choice is okay with me with slight preference for full strict

Comment From: AnjoMan

I'm +1 on this as well. if we can standardize on one function for checking compatibility in all these cases, it will be easier to maintain and also have more test coverage

Comment From: qiuxiaomu

Hi team, I recently bumped into this warning and after reading the change notes and relevant code changes (not very familiar with this area), I am not quite following. result.loc[, 'timestamp'] = pd.to_datetime(result.timestamp).dt.tz_convert(desired_timezone)

This LOC of mine gives me warning. Are we saying, instead of this one liner, I need to break it down to: 1. cast to the desired object type first (.dt not going to do it) 2. then tz_convert to desired timezone.

Thanks

Comment From: jbrockmendel

what warning? what dtype does result['timestamp'] have before you do tz_convert?

Comment From: qiuxiaomu

The warning comes from here:

 if not timezones.tz_compare(self.tz, other.tz):
                # TODO(2.0): remove this check. GH#37605
                warnings.warn(
                    "Setitem-like behavior with mismatched timezones is deprecated "
                    "and will change in a future version. Instead of raising "
                    "(or for Index, Series, and DataFrame methods, coercing to "
                    "object dtype), the value being set (or passed as a "
                    "fill_value, or inserted) will be cast to the existing "
                    "DatetimeArray/DatetimeIndex/Series/DataFrame column's "
                    "timezone. To retain the old behavior, explicitly cast to "
                    "object dtype before the operation.",
                    FutureWarning,
                    stacklevel=find_stack_level(),
                )

The dtype of result['timestamp'] was datetime64[ns, UTC].

Specifically, result['timestamp'].dtype yields {DatetimeTZDtype: ()} datetime64[ns, UTC].

what warning? what dtype does result['timestamp'] have before you do tz_convert?

Comment From: jbrockmendel

To keep the current behavior, in 2.0 you'll need to do df["timestamp"] = ... instead of df.loc[:, "timestamp"] = ....

Comment From: qiuxiaomu

To keep the current behavior, in 2.0 you'll need to do df["timestamp"] = ... instead of df.loc[:, "timestamp"] = ....

Thanks a lot !