Comment From: jbrockmendel
Also round
Comment From: jorisvandenbossche
In addition to numeric indices, this would also be useful for datetimelike indices (and avoid a .to_series()
call to just calculate the diff)
Comment From: vamsi231297
Hi @jorisvandenbossche ; can you explain briefly what this issue is about?
Comment From: jppcr
I'm thinking of contributing to this. It would be my first contribution to Pandas :)
Is the correct place to implement this, in pandas/core/base.py
inside the IndexOpsMixin
class? Should this use the diff
function of pandas/core/algorithms.py
?
Thanks in advance.
Comment From: jbrockmendel
Is the correct place to implement this, in pandas/core/base.py inside the IndexOpsMixin class?
Probably on Index.
Should this use the diff function of pandas/core/algorithms.py?
Either that or to something like return Index(self.to_series().diff(n))
Comment From: jppcr
take
Comment From: jppcr
I have a WIP version of this here: https://github.com/pandas-dev/pandas/compare/main...jppcr:issue-19708-indexdiff
I ended up using the simpler code that @jbrockmendel proposed, Index(self.to_series().diff(n))
.
May I ask for guidance in two things?
1) I added a single simple test, that calculates the difference for two integers. Does it make sense to test for more cases (such as for different Index types), since that would just duplicate the original Series.diff
tests in a different place?
2) For some of the Index types I think it does not make sense at all to implement the diff() function (MultiIndex, IntervalIndex), and for others (TimedeltaIndex, PeriodIndex), more work would be needed than just using Series.diff
in the background. Would it be OK if I just added a check for the type of self
, and raised an TypeError if a certain Index type did not support the diff function?
Below are the Index types I tested the function with:
- RangeIndex, Int64Index, UInt64Index, Float64Index - works OK
- CategoricalIndex - works OK if the categories are numerical, but gives out an error if the categories are of mixed types or strings
- IntervalIndex - does not work
- MultiIndex - does not work
- DateTimeIndex - returns Timedeltas, which makes sense
- TimedeltaIndex - does not work as expected. I think for this to work as expected you would need to be able to pass to the periods
argument a value such as "1 day", or something like that
- PeriodIndex - also does not work as expected
Thanks
Comment From: jppcr
Me: Would it be OK if I just added a check for the type of self, and raised an TypeError if a certain Index type did not support the diff function?
Specifically, I meant checking if self
is of type Range/Int64/Uint64/Float64 or DateTimeIndex. If it isn't, I would raise an TypeError exception saying that using diff
with that Index type isn't supported.
Comment From: jbrockmendel
Me: Would it be OK if I just added a check for the type of self, and raised an TypeError if a certain Index type did not support the diff function?
Specifically, I meant checking if self is of type Range/Int64/Uint64/Float64 or DateTimeIndex. If it isn't, I would raise an TypeError exception saying that using diff with that Index type isn't supported.
Shouldn't be necessary, you should get an appropriate exception from the Series.diff method. Could special-case RangeIndex/MultiIndex to fast-path.
Does it make sense to test for more cases
Yes. You can use the index
fixture.
TimedeltaIndex - does not work as expected. I think for this to work as expected you would need to be able to pass to the periods argument a value such as "1 day", or something like that
pd.timedelta_range("1 Day", periods=3).to_series().diff()
looks fine to me.
Comment From: brunocoutinhoo
take