I have some data that is roughly like "%H:%M:%S %P" (where %P is AM/PM). I'd like to store it as a Timedelta
representing seconds since midnight.
Currently parsing that "succeeds", ignoring the AM / PM
In [3]: import pandas as pd
In [6]: raw = ['3:25:00 AM', '3:25:00 PM', '12:30:00 AM', '12:30:00 PM']
...: base = pd.to_timedelta(raw)
...: base
...:
Out[6]: TimedeltaIndex(['03:25:00', '03:25:00', '12:30:00', '12:30:00'], dtype='timedelta64[ns]', freq=None)
Here's my current workaround.
In [7]: split = pd.Series(raw).str.extract(r"(?P<base>\S+) (?P<am_pm>\w{2})", expand=True)
...: base = pd.to_timedelta(split['base'])
...: hour = base.dt.total_seconds() // (60 * 60)
...:
...: move_ahead = (split.am_pm == "PM") & (hour < 12)
...: move_back = (split.am_pm == "AM") & (hour == 12)
...:
...: base[move_ahead] += pd.Timedelta("12H")
...: base[move_back] -= pd.Timedelta("12H")
...: base
...:
Out[7]:
0 03:25:00
1 15:25:00
2 00:30:00
3 12:30:00
Name: base, dtype: timedelta64[ns]
That works, but is a bit tricky to get right (assuming I have gotten it right). I'd propose that we either
- raise when we see am/pm in the data, and add this as cookbook recipe
- Try to support parsing this kind of data directly
Comment From: chris-b1
The 'today'
part of this is a hack, but here's a slightly simpler recipe
In [23]: raw = ['3:25:00 AM', '3:25:00 PM', '12:30:00 AM', '12:30:00 PM']
In [24]: pd.to_datetime(raw)
Out[24]:
DatetimeIndex(['2017-12-15 03:25:00', '2017-12-15 15:25:00',
'2017-12-15 00:30:00', '2017-12-15 12:30:00'],
dtype='datetime64[ns]', freq=None)
In [25]: pd.to_datetime(raw) - pd.Timestamp('today').normalize()
Out[25]: TimedeltaIndex(['03:25:00', '15:25:00', '00:30:00', '12:30:00'], dtype='timedelta64[ns]', freq=None)
Comment From: TomAugspurger
Oh, that's clever :)
Comment From: WillAyd
If we wanted to do option 2 I don't think it would be all that difficult. I think we would just need to update parse_timedelta_string
to detect AM/PM and add 1000000000L * 3600 * 12
to the result
in case of the latter. The only exception would be if the hour is 12 - then we'd remove that amount from the result if the AM is detected and leave PM untouched.
Happy to give it a go in a PR
https://github.com/pandas-dev/pandas/blob/b5f1e716d89487423bb12d6cc4e6da2b39184531/pandas/_libs/tslibs/timedeltas.pyx#L238
Comment From: pganssle
I would think that it's inadvisable to include AM
and PM
in any sort of Timedelta
parser. I would think parsing what is obviously a time to Timedelta
is just the wrong behavior. I would probably throw an error on AM/PM and instead use @chris-b1's solution. The today
thing can be made more explicit by specifying any default date (unless the "today" behavior is 100% known to be stable and not likely to be affected by DST changes or leap seconds or whatever) for the parse and then subtracting that off (or by adding first-class Time
support that doesn't require a date portion).
Comment From: mihaicezar
I agree with @pganssle for the fact that Timedelta
should not be used for parsing a time. Probably timedeltas.pyx
should have always thrown an error on AM/PM. This solution means it's then on the external developer to deal with the problem (and @chris-b1's workaround is a bit risky).
However, if support for AM/PM is decided, @WillAyd's idea is a bit incomplete in the negative domain: While pandas.Timedelta('-1 days 12:59:59 AM')
is covered, pandas.Timedelta('- 12:59:59 AM')
should raise an error (it has literally no logical meaning).
So which way should this go?
Comment From: WillAyd
I think an error here makes sense so unless @TomAugspurger disagrees would certainly take a PR!
Comment From: TomAugspurger
Raising sounds fine.
Comment From: mihaicezar
@WillAyd @TomAugspurger Just to be sure, you mean raising error on any AM/PM, right?
Comment From: WillAyd
Yep!
Comment From: jbrockmendel
Looks like pd.core.tools.times.to_time does not handle the OP example.