values = [pd.Timestamp.now(), 1]
pd.Series(values, dtype="M8[ns]") # <- casts
pd.DatetimeIndex(values, dtype="M8[ns]") # <- raises
pd.to_datetime(values) # <- raises
These should either all cast or all raise. I'd be OK either way, leaning towards all-raise.
Comment From: mroeschke
I would be +1 to raising (given the exception message notes how to cast)
Comment From: jbrockmendel
given the exception message notes how to cast
not sure what you have in mind. if all of our functions raise, do we tell them to do it manually?
Comment From: mroeschke
Namely I could see users wanting "I am providing mixed datetimes & ints and I want the ints to be interpreted as ns and the output reso to be ns"
But it's a fair point that the resolution may be hard to always suggest if this is raised internally
Comment From: jbrockmendel
cc @jreback
Comment From: jbrockmendel
I'm thinking that if we allow this at all- which I'm of mixed mind on- it should be in to_datetime, which I would expect to be the most aggressive about "cast to datetimes no matter what"
Comment From: jorisvandenbossche
I personally don't like the behaviour of casting such mixed type values much (I was actually going to say on the PR that I would rather disallow it), but there might also be some consistency reasons to allow it (or at least other cases to consider as well).
A mixture of timestamps and strings casts as well, and in this case also DatetimeIndex allows it:
>>> pd.Series([pd.Timestamp("2012-01-01"), "2012-01-01"], dtype=object).astype("datetime64[ns]")
0 2012-01-01
1 2012-01-01
dtype: datetime64[ns]
>>> pd.DatetimeIndex([pd.Timestamp("2012-01-01"), "2012-01-01"], dtype="datetime64[ns]")
DatetimeIndex(['2012-01-01', '2012-01-01'], dtype='datetime64[ns]', freq=None)
We should probably also be talking about casting (astype
), since we want the Series constructor with specified dtype to be consistent with this.
We do allow to cast integers to datetime64, we do allow strings to be cast to datetime64, so it would probably be strange to disallow casting those if they happen to be in an object dtype series?
I mean, we allow pd.Series([1, 2], dtype="int64").astype("datetime64[ns]")
, so probably we would also allow pd.Series([1, 2], dtype="object").astype("datetime64[ns]")
?
And if we allow casting object dtype ints, and object dtype strings, we should probably also casting a mixture (I would say that casting object dtype basically is a cast element-by-element, since you basically don't have a specified dtype but just individual "objects")
Unless we would say we disallow object -> datetime64
cast in general (regardless of the values inside), but I suppose that's not something we want to do.
I'm thinking that if we allow this at all- which I'm of mixed mind on- it should be in to_datetime, which I would expect to be the most aggressive about "cast to datetimes no matter what"
Yes, agreed that if we would disallow, pd.to_datetime
can be the alternative to refer to as the most generic/flexible "convert to datetime" function.