@spencerkclark is working on a custom pandas.Index
subclass for xarray (see https://github.com/pydata/xarray/issues/1084) like pandas.DatetimeIndex
to handle arrays of netcdftime.datetime
objects. This index is primarily intended for use with xarray, but ideally we'd like it to work in pandas Series
and DataFrame
objects, too.
The subclass will include implementations of at least get_loc
, get_slice_bound
and get_value
(this one should probably be unnecessary, but it's needed for pandas.Series
). To minimize fragility, it will not subclass DatetimeIndex
but will instead copy some of the relevant code (thank you open source!).
Two questions for other pandas devs:
- Is there any fundamental reason why a custom pandas.Index
subclass won't work on a Series
or DataFrame
?
- Does this seem like a reasonable thing to do, or we are setting ourselves up for suffering in the future? I'll update this issue when we have a concrete PR to look at.
At a bare minimum, we should probably add some tests to pandas to ensure that a basic subclass works.
Comment From: jreback
At a bare minimum, we should probably add some tests to pandas to ensure that a basic subclass works.
there are already quite a few subclasses of Indexes (internally). The API is not publicly exposed. I think it would take a bit of work to make it 'simpler'.
You have to define a fair bit of machinery (lots of methods) to make it work properly. Including construction, inference, equality, testing, and various indexing routines.
It is thus straightforward, but not trivial to sub-class. (remember IntervalIndex!)
Is there any fundamental reason why a custom pandas.Index subclass won't work on a Series or DataFrame?
It will work. though there may be some API leakage (IOW some methods are 'internal', others are 'public form the main pandas API).
Does this seem like a reasonable thing to do, or we are setting ourselves up for suffering in the future? I'll update this issue when we have a concrete PR to look at.
why do you think you need a custom Index?
Comment From: shoyer
It is thus straightforward, but not trivial to sub-class. (remember IntervalIndex!)
Indeed, I do :)
why do you think you need a custom Index?
The climate science community wants the convenient indexing of DatetimeIndex
, but datetime64[ns]
will suffice for them. Not only do they need to handle dates outside the range representable with ns resolution (prior to 1672), but they also use all sorts of funny calendar conventions, e.g., pretending leap years never exist, or that every month has exactly 30 days.
Comment From: jreback
@shoyer ok in that case I would directly subclass DatetimeIndex
or PeriodIndex
. If you do this you get all kinds of things for free (e.g. resample, NaT, accessors, partial string indexing, etc). So would be much simpler.
As I said above, you might have some API leakage (IOW, we have a notion of pandas functions calling Index methods which are not 'public' per se). But nothing insurmountable.
So comes down do you need: points-in-time, or spans.