Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# Raises an error for >2 digit hours or minutes
pd.Timedelta('PT100H')
pd.Timedelta('PT100M')
# Incorrectly parses any decimal component apart from seconds
pd.Timedelta('PT3.5H') == pd.Timedelta('PT5H3S')
# Raises an error for comma as decimal separator
pd.Timedelta('PT3,5H')
# Does not raise an error for plusses in numbers
pd.Timedelta('P+1S') == pd.Timedelta('PT1S')
# Does not raise an error for unsupported month component, instead parses it as minutes
pd.Timedelta('P1M') == pd.Timedelta('PT1M')
# Does not raise an error for hour and second components before the 'T' seperator
pd.Timedelta('P1H') == pd.Timedelta('PT1H')
pd.Timedelta('P1S') == pd.Timedelta('PT1S')
# Does not raise an error for day components after the 'T' seperator
pd.Timedelta('PT1D') == pd.Timedelta('P1D')
# Does not raise an error for duplicate components (and produces unexpected result)
pd.Timedelta('PT1S1S1S1S') == pd.Timedelta('PT20M34S')
# Does not raise an error for components in the incorrect order
pd.Timedelta('PT1M1H') == pd.Timedelta('PT1H1M')
# Does not raise an error for trailing 'T' separator
pd.Timedelta('P1DT') == pd.Timedelta('P1D')
# Does not raise an error for duplicate 'T' separators
pd.Timedelta('PTTT1H') == pd.Timedelta('PT1H')
# Does not raise an error for 'P' or 'T' in any position
pd.Timedelta('PTP1TPTH') == pd.Timedelta('PT1H')
Issue Description
Numbers
- Raises a ValueError when the hours or minutes part is longer than 2 digits. There is nothing in the standard that requires this. There is an explicit check for the length of the hours and minute parts in the parser. #51820 https://github.com/pandas-dev/pandas/blob/f9be5d90a6bb72f2d11f47046f483efbcd5a787e/pandas/_libs/tslibs/timedeltas.pyx#L883
- Incorrectly parses all decimal components aparts from seconds.
- Does not allow the comma
,
as a decimal separator - Allows plusses (
+
) to prefix numbers, this is outwith the standard. https://github.com/pandas-dev/pandas/blob/f9be5d90a6bb72f2d11f47046f483efbcd5a787e/pandas/_libs/tslibs/timedeltas.pyx#L880
Components
- Parses month (
M
) components as minutes. https://github.com/pandas-dev/pandas/blob/f9be5d90a6bb72f2d11f47046f483efbcd5a787e/pandas/_libs/tslibs/timedeltas.pyx#L885 - Parses hour (
H
) or second (S
) components before theT
- Parses day components after the
T
- Parses duplicate components with weird results
- Parses components in the incorrect order
P
and T
symbols
- Ignores
P
andT
symbols anywhere in the string. https://github.com/pandas-dev/pandas/blob/f9be5d90a6bb72f2d11f47046f483efbcd5a787e/pandas/_libs/tslibs/timedeltas.pyx#L873
Expected Behavior
The follwing should all parse without errors.
pd.Timedelta("PT100H") == pd.Timedelta(hours=100)
pd.Timedelta("PT100M") == pd.Timedelta(minutes=100)
pd.Timedelta("PT3,5H") == pd.Timedelta(hours=3.5)
The follwing should all raise errors.
# Bad numbers
pd.Timedelta("PT.1S")
pd.Timedelta("PT+1S")
# Bad components
pd.Timedelta("P1M")
pd.Timedelta("P1H")
pd.Timedelta("P1S")
pd.Timedelta("PT1D")
pd.Timedelta("PT1M1H")
pd.Timedelta("P1S1S")
# Misplaced P or T
pd.Timedelta("P1PD")
pd.Timedelta("P1TD")
pd.Timedelta("P1DT")
Installed Versions
INSTALLED VERSIONS
------------------
commit : 2e218d10984e9919f0296931d92ea851c6a6faf5
python : 3.11.1.final.0
python-bits : 64
pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None
Comment From: rosstitmarsh
Also covers #48122