There are a handful of methods that do casting/validation in DatetimeArray (and indirectly, Series and DatetimeIndex) methods:
__setitem__
shift
insert
fillna
take
searchsorted
__cmp__
In all cases, we upcast datetime64/datetime to Timestamp, try to parse strings, and check for tzawareness compat (i.e. raise if (self.tz is None) ^ (other.tz is None)
)
In the cases of searchsorted and cmp, we stop there, and don't require the actual timezones to match. In the other cases, we are stricter and raise if not tz_compare(self.tz, other.tz)
In #37299 we discussed the idea of making these uniform. I am generally positive on this, as it would make things simpler both for users and in the code. cc @jorisvandenbossche @jreback
Side-notes
1) __sub__
doesn't do the same casting, but also requires tz match but could get by with only tzawareness compat, xref #37329. BTW the stdlib datetime only requires tzawareness compat.
2) In Series/DataFrame.__setitem__
we cast to object instead of raising on tz mismatch. For brevity I lump this in with "we raise on tz mismatch".
Comment From: jorisvandenbossche
Personally, I would go for "tz-awareness compat" requirement for all cases, I think. I don't think there are really "ambiguous" cases about what the result would be if all involved are tz-aware? (which if of course not the same with mixed tz-awareness)
Comment From: jbrockmendel
im on board with going all-tzawareness-compat. Assuming we get consensus on that, should we do a deprecation cycle or Just Do It?
Comment From: jreback
yeah i think should deprecate now
+1 in strict
Comment From: jbrockmendel
@jreback to be clear, joris and I are advocating for the non-strict version
Comment From: jbrockmendel
@TomAugspurger @simonjayhawkins @WillAyd @mroeschke any preferences here?
Comment From: mroeschke
If the non-strict route was chosen, what would be the resulting timezone after these operations?
Comment From: jbrockmendel
If the non-strict route was chosen, what would be the resulting timezone after these operations?
self.tz
would be retained
Comment From: mroeschke
I guess either choice is okay with me with slight preference for full strict
Comment From: AnjoMan
I'm +1 on this as well. if we can standardize on one function for checking compatibility in all these cases, it will be easier to maintain and also have more test coverage
Comment From: qiuxiaomu
Hi team, I recently bumped into this warning and after reading the change notes and relevant code changes (not very familiar with this area), I am not quite following.
result.loc[, 'timestamp'] = pd.to_datetime(result.timestamp).dt.tz_convert(desired_timezone)
This LOC of mine gives me warning.
Are we saying, instead of this one liner, I need to break it down to:
1. cast to the desired object type first (.dt
not going to do it)
2. then tz_convert
to desired timezone.
Thanks
Comment From: jbrockmendel
what warning? what dtype does result['timestamp']
have before you do tz_convert?
Comment From: qiuxiaomu
The warning comes from here:
if not timezones.tz_compare(self.tz, other.tz):
# TODO(2.0): remove this check. GH#37605
warnings.warn(
"Setitem-like behavior with mismatched timezones is deprecated "
"and will change in a future version. Instead of raising "
"(or for Index, Series, and DataFrame methods, coercing to "
"object dtype), the value being set (or passed as a "
"fill_value, or inserted) will be cast to the existing "
"DatetimeArray/DatetimeIndex/Series/DataFrame column's "
"timezone. To retain the old behavior, explicitly cast to "
"object dtype before the operation.",
FutureWarning,
stacklevel=find_stack_level(),
)
The dtype
of result['timestamp']
was datetime64[ns, UTC]
.
Specifically, result['timestamp'].dtype
yields {DatetimeTZDtype: ()} datetime64[ns, UTC]
.
what warning? what dtype does
result['timestamp']
have before you do tz_convert?
Comment From: jbrockmendel
To keep the current behavior, in 2.0 you'll need to do df["timestamp"] = ...
instead of df.loc[:, "timestamp"] = ...
.
Comment From: qiuxiaomu
To keep the current behavior, in 2.0 you'll need to do
df["timestamp"] = ...
instead ofdf.loc[:, "timestamp"] = ...
.
Thanks a lot !