I've created a slightly modified version of PeriodIndex for some projects I'm working on that you may find useful for pandas. I rewrote the date-math code from scratch in Cython, it has a wide range of frequencies, it's very easy to extend to new frequencies, and the new code is quite compact.
Code is at https://github.com/abielr/pandasreg. There isn't much documentation yet, but all the key date-math code is in pandasreg/src/rfreq.pyx. Most of the rest of the code is a straight copy of the pandas code for Period and PeriodIndex and some other utility functions to work with the custom index.
Basic approach for handling frequencies is as follows: - There is a base class RFrequency that defines an interface to convert datetime-like objects to an ordinal period at a given frequency (just like PeriodIndex), and to do the reverse. - Periods at one frequency can be converted to periods at another frequency. - Classes that inherit from RFrequency define the date-math logic that is particular to a given frequency. However, there is another flexibility for a single child class to define multiple frequencies. For example, the monthly class defines the monthly, bimonthly, quarterly, semiannual, and annual frequency without having to write custom code for each variant. - You can force a frequency that is a multiple of a base frequency to pass through a particular period. For example, the semiannual frequency is created from the monthly frequency with a 6-month stride and the requirement that it pass through Jun 1970, which ensures that 1H=Jan-Jun and 2H=Jul-Dec always. (see this stackoverflow post for more background).
Looking at the pandas code, it appears that much of the current PeriodIndex functionality is based on the old scikits.timeseries project. I haven't done a detailed performance comparison, but relative to the scikits.timeseries codebase my code could be of benefit in pandas since there is no pure-C code to maintain, there is much less date-math code, and its easier to extend to new frequencies.
Comment From: nehalecky
Hey @abielr,
I just came across this issue as I am also looking to expand functionality of the PeriodIndex
to support multiples of DateOffsets, but it seems you've already tackled this in elegant way.
Before I totally dive in cloning your repo, I wanted to know the status of this project? Do you still use it with current pandas build 0.12
? Did this ever gain any traction with the pandas devs? It seems like a natural evolution of functionality in handling regularly-spaced time series information (especially handling those that happen to be multiples of date offsets).
Thank you. :)
Comment From: wesm
Closing, but would you like to get involved with the pandas 2.0 effort? Come on over https://github.com/pydata/pandas-design