Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/dev/reference/api/pandas.Timedelta.html?highlight=timedelta#pandas.Timedelta
Documentation problem
I think the documentation for the seconds
attribute of a Timedelta object should have a clear warning to tell people that they probably want to use the total_seconds()
method instead.
Apparently, the seconds
attribute of a Timedelta object takes hours, minutes, and seconds into account, but not days which I wouldn't even imagine would be necessary in the example below where I try to find the difference in seconds between two datetimes that are a minute apart:
import pandas as pd
a = pd.to_datetime("2023-01-25 10:59")
b = pd.to_datetime("2023-01-25 10:58")
print(a) # 2023-01-25 10:59:00
print(b) # 2023-01-25 10:58:00
c = b - a
print(c) # -1 days +23:59:00
print(c.seconds) # 86340
This is really surprising to me. Only later did I realise that the total_seconds()
method exists. Apparently it does what I expected from the seconds
attribute in the first place.
print(c.total_seconds()) # -60.0
Suggested fix for documentation
In my case, of course, the ideal would have been if pandas had displayed a warning when I use the seconds
attribute. Or even if it just did what I expected.
Assuming that it is for better or for worse the intended behaviour, shouldn't it be better documented? Is there a way that I could have known this? Is it described in some place that I didn't know to look?
Comment From: MarcoGorelli
Thanks @Styrke for the report
Isn't this correct? It says
seconds | Return the total hours, minutes, and seconds of the timedelta as seconds.
and that's what's been returned:
In [5]: c
Out[5]: Timedelta('-1 days +23:59:00')
In [6]: 59 * 60 + 23*60*60
Out[6]: 86340
the ideal would have been if pandas had displayed a warning
This isn't really possible as it would end up displaying a warning each time some calls it on purpose
I think this just needs an example. Do you want to open a pull request to add this an example?
Comment From: Styrke
Thanks for your reply @MarcoGorelli.
Isn't this correct? It says
seconds | Return the total hours, minutes, and seconds of the timedelta as seconds.
It is technically correct but the pitfall is in what it doesn't say and the apparent interaction with negative Timedeltas.
First, you have to be quite alert when reading that description to consciously notice that it excludes days, months, and years.
Second, even if I had read and fully realised the implications of that description, I would have disregarded it because I am working with timedeltas of up to a few hours at most. I did not realise that -1 minute would be represented as -1 days +23:59:00, thus putting me in a situation where I am affected by the limits of the seconds
attribute.
the ideal would have been if pandas had displayed a warning
This isn't really possible as it would end up displaying a warning each time some calls it on purpose
I may display some ignorance in the following because I haven't looked at the source code. Please just let me know if I have any substantial misunderstandings.
I can't imagine what need downstream users of pandas would have that they should prefer the seconds
attribute for over the total_seconds()
method. I am sure there are a some use cases for it but I expect them to be somewhat advanced niches. I would be curious to see if you know of any. My hypothesis is that total_seconds()
it the sane default for most users.
My guess is that the seconds
attribute exposes the internal representation of the Timedelta and would be really hard or impossible to change for compatibility and stability reasons. Thus, I expect that if a warning was displayed every time the seconds
attribute is used, it would also appear when using the object for other things (such as calling the total_seconds()
method).
I think that the appropriate action here would be to make it very clear to everyone that they probably want to use the total_seconds()
method rather than the seconds
attribute. Practically, this warning should at the very least be put directly in the documentation for the seconds
attribute. That is also why I tagged this issue with the 'docs' tag.
Ideally there would also be a warning that is displayed every time the seconds
attribute is used externally. Realistically, I understand that this would probably be very hard or impossible to do.
I think this just needs an example. Do you want to open a pull request to add this an example?
If you agree, I will take a stab at both updating the description of the seconds
attribute with a clarification and warning and probably adding an example as well.
Comment From: MarcoGorelli
Sure, would take a doc clarification, thanks!
Comment From: rmhowe425
take
Comment From: rmhowe425
@MarcoGorelli I've worked on this issue over the past 2 weeks and I can't figure out which file I need up update the docstrings in.
After I build my dev environment I see that a .c file is generated that contains all of the docstrings for the Timedelta class. However, that doesn't appear to be the file that I should actually be updating.
Could you provide some guidance on this issue? Thanks!
Comment From: MarcoGorelli
Hey, what do you mean by
However, that doesn't appear to be the file that I should actually be updating
Comment From: rmhowe425
@MarcoGorelli I initially thought that the C file was the correct file to update. However, I don't see any .c files in the pandas GitHub repo. Git also warned me that I probably shouldn't push the .c file to Github. I'm assuming that I'm probably updating the wrong file.
Sorry for the confusion. This is my first time contributing to pandas!
Comment From: MarcoGorelli
which .c file did you try editing?
Comment From: rmhowe425
It was timedeltas.c
under the pandas.libs._tslibs folder.
I was updating the comments in that file and then I would re-package the pandas lib and run help() in a python interactive interpreter to confirm my changes
Comment From: MarcoGorelli
ah you probably want the corresponding .pyx
file then, try searching for timedeltas.pyx
Comment From: rmhowe425
So, I thought so too. The pyx file is linked to the C file via the "cythonization" that takes place when I create the build environment right?
My problem (or lack of understanding) is that whenever I update the c file, re-build by build environment and then run git status no files appear to have been modified. I also don't see any doc-strings in the .pyx file either.
Comment From: MarcoGorelli
You don't update the c file, you update the pyx file
Here's the docstring
https://github.com/pandas-dev/pandas/blob/d0fbeb49cb486bd06bca959649ea497f504c90e6/pandas/_libs/tslibs/timedeltas.pyx#L1061-L1092
Comment From: rmhowe425
ah okay! Thank you for the clarity! Much appreciated! I'll take another stab at this issue later tonight!
Comment From: rmhowe425
take