Pandas version checks

  • [X] I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/reference/api/pandas.Timedelta.html?highlight=timedelta#pandas.Timedelta

Documentation problem

I think the documentation for the seconds attribute of a Timedelta object should have a clear warning to tell people that they probably want to use the total_seconds() method instead.

Apparently, the seconds attribute of a Timedelta object takes hours, minutes, and seconds into account, but not days which I wouldn't even imagine would be necessary in the example below where I try to find the difference in seconds between two datetimes that are a minute apart:

import pandas as pd

a = pd.to_datetime("2023-01-25 10:59")
b = pd.to_datetime("2023-01-25 10:58")
print(a)  # 2023-01-25 10:59:00
print(b)  # 2023-01-25 10:58:00

c = b - a
print(c)  # -1 days +23:59:00
print(c.seconds)  # 86340

This is really surprising to me. Only later did I realise that the total_seconds() method exists. Apparently it does what I expected from the seconds attribute in the first place.

print(c.total_seconds())  # -60.0

Suggested fix for documentation

In my case, of course, the ideal would have been if pandas had displayed a warning when I use the seconds attribute. Or even if it just did what I expected.

Assuming that it is for better or for worse the intended behaviour, shouldn't it be better documented? Is there a way that I could have known this? Is it described in some place that I didn't know to look?

Comment From: MarcoGorelli

Thanks @Styrke for the report

Isn't this correct? It says

seconds | Return the total hours, minutes, and seconds of the timedelta as seconds.

and that's what's been returned:

In [5]: c
Out[5]: Timedelta('-1 days +23:59:00')

In [6]: 59 * 60 + 23*60*60
Out[6]: 86340

the ideal would have been if pandas had displayed a warning

This isn't really possible as it would end up displaying a warning each time some calls it on purpose


I think this just needs an example. Do you want to open a pull request to add this an example?

Comment From: Styrke

Thanks for your reply @MarcoGorelli.

Isn't this correct? It says

seconds | Return the total hours, minutes, and seconds of the timedelta as seconds.

It is technically correct but the pitfall is in what it doesn't say and the apparent interaction with negative Timedeltas.

First, you have to be quite alert when reading that description to consciously notice that it excludes days, months, and years.

Second, even if I had read and fully realised the implications of that description, I would have disregarded it because I am working with timedeltas of up to a few hours at most. I did not realise that -1 minute would be represented as -1 days +23:59:00, thus putting me in a situation where I am affected by the limits of the seconds attribute.

the ideal would have been if pandas had displayed a warning

This isn't really possible as it would end up displaying a warning each time some calls it on purpose

I may display some ignorance in the following because I haven't looked at the source code. Please just let me know if I have any substantial misunderstandings.

I can't imagine what need downstream users of pandas would have that they should prefer the seconds attribute for over the total_seconds() method. I am sure there are a some use cases for it but I expect them to be somewhat advanced niches. I would be curious to see if you know of any. My hypothesis is that total_seconds() it the sane default for most users.

My guess is that the seconds attribute exposes the internal representation of the Timedelta and would be really hard or impossible to change for compatibility and stability reasons. Thus, I expect that if a warning was displayed every time the seconds attribute is used, it would also appear when using the object for other things (such as calling the total_seconds() method).

I think that the appropriate action here would be to make it very clear to everyone that they probably want to use the total_seconds() method rather than the seconds attribute. Practically, this warning should at the very least be put directly in the documentation for the seconds attribute. That is also why I tagged this issue with the 'docs' tag.

Ideally there would also be a warning that is displayed every time the seconds attribute is used externally. Realistically, I understand that this would probably be very hard or impossible to do.

I think this just needs an example. Do you want to open a pull request to add this an example?

If you agree, I will take a stab at both updating the description of the seconds attribute with a clarification and warning and probably adding an example as well.

Comment From: MarcoGorelli

Sure, would take a doc clarification, thanks!

Comment From: rmhowe425

take

Comment From: rmhowe425

@MarcoGorelli I've worked on this issue over the past 2 weeks and I can't figure out which file I need up update the docstrings in.

After I build my dev environment I see that a .c file is generated that contains all of the docstrings for the Timedelta class. However, that doesn't appear to be the file that I should actually be updating.

Could you provide some guidance on this issue? Thanks!

Comment From: MarcoGorelli

Hey, what do you mean by

However, that doesn't appear to be the file that I should actually be updating

Comment From: rmhowe425

@MarcoGorelli I initially thought that the C file was the correct file to update. However, I don't see any .c files in the pandas GitHub repo. Git also warned me that I probably shouldn't push the .c file to Github. I'm assuming that I'm probably updating the wrong file.

Sorry for the confusion. This is my first time contributing to pandas!

Comment From: MarcoGorelli

which .c file did you try editing?

Comment From: rmhowe425

It was timedeltas.c under the pandas.libs._tslibs folder.

I was updating the comments in that file and then I would re-package the pandas lib and run help() in a python interactive interpreter to confirm my changes

Comment From: MarcoGorelli

ah you probably want the corresponding .pyx file then, try searching for timedeltas.pyx

Comment From: rmhowe425

So, I thought so too. The pyx file is linked to the C file via the "cythonization" that takes place when I create the build environment right?

My problem (or lack of understanding) is that whenever I update the c file, re-build by build environment and then run git status no files appear to have been modified. I also don't see any doc-strings in the .pyx file either.

Comment From: MarcoGorelli

You don't update the c file, you update the pyx file

Here's the docstring

https://github.com/pandas-dev/pandas/blob/d0fbeb49cb486bd06bca959649ea497f504c90e6/pandas/_libs/tslibs/timedeltas.pyx#L1061-L1092

Comment From: rmhowe425

ah okay! Thank you for the clarity! Much appreciated! I'll take another stab at this issue later tonight!

Comment From: rmhowe425

take