Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
In [15]: to_datetime(['2020-01-01 00:00:00+01:00', '2020-01-01 00:00:00+02:00', None], format='%Y-%m-%d %H:%M:%S%z')
Out[15]: Index([2020-01-01 00:00:00+01:00, 2020-01-01 00:00:00+02:00, None], dtype='object')
In [16]: to_datetime(['2020-01-01 00:00:00+01:00', '2020-01-01 00:00:00+02:00', None], format='%Y-%d-%m %H:%M:%S%z')
Out[16]: Index([2020-01-01 00:00:00+01:00, 2020-01-01 00:00:00+02:00, NaT], dtype='object')
Issue Description
In the first (ISO) case, None
stays None
. But in the second (non-ISO) case, None
becomes NaT
.
Expected Behavior
@jbrockmendel any thoughts on what should be correct?
I think I'd prefer the latter (just change to NaT
)
Installed Versions
Comment From: MarcoGorelli
@mroeschke do you have thoughts on which is correct?
Comment From: jbrockmendel
i think id expect both to be NaT and both to be DatetimeIndex
Comment From: MarcoGorelli
I don't think DatetimeIndex is possible with mixed offsets, is it? (note how the strings have different offsets)
Comment From: jbrockmendel
didnt notice that, you're right
Comment From: mroeschke
Should the errors
keyword be applicable here?
Since there's an implicit to_datetime(..., format=..., errors="raise")
shouldn't this raise since format
can't be applied to None
? Not sure if we have testing for None
being special
Comment From: MarcoGorelli
yeah it's tested here
https://github.com/pandas-dev/pandas/blob/8da8743729118139910b9eeb27e27a1ceceb2c4f/pandas/tests/tools/test_to_datetime.py#L1343-L1363
I just think expected
is wrong on two counts:
- the first "NaT"
should be NaT
- the None
should be NaT
Comment From: mroeschke
I guess in that particular example format
is not provided thought.
I haven't looked at mixed-argument to_datetime
testing in a while but leads me to 2 questions
- If
format
is provided, should all arguments be strings (or datetimes that were recently added?) or can other arguments be parsed with non-formatting paths? - Should in theory a
None.strftime(format) -> NaT
parsing be supported? SinceNaT.strp/stftime
are designed to raise errors my initial thought is no but could be convinced otherwise
Comment From: mroeschke
I just think expected is wrong on two counts:
Agreed with your conclusions
Comment From: MarcoGorelli
Not sure it should depend on whether or not format
was passed - what I'm leaning towards is:
- skip over any NaT-y (None, NaT, nan, 'nan', etc.) -> convert all to NaT
- skip (or rather, preserve) pydatetime objects
- if format
is provided, parse according to that (respecting exact
). Else, guess/infer
Note that with PDEP4, it'll always behave as if format
was passed (unless pandas
can't guess the format)
Comment From: mroeschke
Ah okay I like those guidelines then.
Comment From: jbrockmendel
looks like the example you used is the same as from #40111, so a PR that addresses this may close that
Comment From: jorisvandenbossche
Resolved by https://github.com/pandas-dev/pandas/pull/50242