When converting a fractional year to a Timedelta
object the float is truncated:
print pd.to_timedelta(1., unit='Y')
365 days 05:49:12
print pd.to_timedelta(1.5, unit='Y')
365 days 05:49:12
print pd.to_timedelta(0.5, unit='Y')
0 days 00:00:00
print pd.to_timedelta(0.9, unit='Y')
0 days 00:00:00
Using a float with the hour unit is OK:
print pd.to_timedelta(0.9, unit='H')
0 days 00:54:00
print pd.to_timedelta(1., unit='H')
0 days 01:00:00
Pandas version:
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 21.2.1
Cython: 0.20.1
numpy: 1.11.0
scipy: 0.16.0
statsmodels: 0.7.0.dev-51faa1a
xarray: None
IPython: 4.2.0
sphinx: 1.2.2
patsy: 0.3.0
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.28.0
pandas_datareader: None
Comment From: jreback
So we pass to numpy for this conversion (then pandas converts to the proper units). It doesn't allow floats, so its converted to integer. So should either:
- raise that this conversion is not allowed
- do the conversion entirely in pandas (e.g. Y
== 365 1/4 days I think), though numpy is more exact. This is a fishy calculation anyhow, as a Timedelta is a fixed unit (its not w/respect to a calendar). Someone would have to look at exactly what they are doing
- remove support for these non-fixed conversions (e.g. Y
, M
)
In [1]: np.timedelta64(1,unit='Y')
Out[1]: numpy.timedelta64(1)
In [2]: np.timedelta64(1.5,unit='Y')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-9d5b43137138> in <module>()
----> 1 np.timedelta64(1.5,unit='Y')
ValueError: Could not convert object to NumPy timedelta
Comment From: jreback
see around here. I think look into the 2nd point, see docs here and/or source code and see what we can do.
Comment From: jreback
cc @jorisvandenbossche
Comment From: agartland
Thanks @jreback
I can see why conversion of a non-fixed unit is a little weird.
I would propose that pandas
should maintain consistency with numpy
which means raising a ValueError
for non int
arguments with units Y
or M
.
I could see other arguments though.
Comment From: jorisvandenbossche
I actually was of the impression we disallowed this, but apparantly, it's in to_datetime
that such units are not allowed, but in to_timedelta
they are.
In [10]: pd.to_datetime(1, unit='M')
...
ValueError: cannot cast unit M
I personally would remove support for those units (above 'D'), as it is rather garbage that is produced (if people expect that it gives exactly eg the month they want, but not any month is 30 days and 10 hours ...).
Not sure how much that would break code (how many people are relying on this behaviour?)
Comment From: jreback
hmm ok let's deprecate it then? and maybe raise on floats
Comment From: jorisvandenbossche
Yes, deprecating first is a good idea. If we deprecate, I would also just deprecate the floats, it will be removed later on anyhow (and for other units floats work correctly)
Comment From: agartland
I think raising on floats for Y
and M
is important since the current version is quietly truncating values, which is potentially dangerous behavior.
Comment From: jreback
ok let's deprecate Y
, M
. W
(even though weeks are 7 days, easier to just have D
as the highest values). For both floats & integers.
Comment From: jorisvandenbossche
Closing this in favor of https://github.com/pandas-dev/pandas/issues/16344