• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas


class CustomIndex(pandas.Index):

    def __new__(cls, data, **kwargs):
        return super().__new__(cls, data, **kwargs).view(cls)

    def __init__(self, data, **kwargs):
        super().__init__(self, data, **kwargs)


if __name__ == '__main__':
    index = CustomIndex([1,2,3])
    print(type(index), index)

The result is: <class 'pandas.core.indexes.numeric.Int64Index'> Int64Index([1, 2, 3], dtype='int64')

Problem description

In the pandas docs: "The motivation for having an Index class in the first place was to enable different implementations of indexing. This means that it’s possible for you, the user, to implement a custom Index subclass that may be better suited to a particular application than the ones provided in pandas."

Expected Output

<class 'CustomIndex'> CustomIndex([1, 2, 3], dtype='int64')

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 20.1.0
Version          : Darwin Kernel Version 20.1.0: Sat Oct 31 00:07:11 PDT 2020; root:xnu-7195.50.7~2/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.4
numpy            : 1.19.4
pytz             : 2020.4
dateutil         : 2.8.1
pip              : 20.2.4
setuptools       : 50.3.2
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.19.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 2.0.0
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Comment From: ivanovmg

I confirm the issue on master.

Ever overriding __repr__ and methods does not make a difference.

from pandas.core.indexes.base import Index


class CustomIndex(Index):
    def __repr__(self):
        return "CustomIndex"

    def take(self, indices):
        return 777


if __name__ == '__main__':
    index = CustomIndex([1,2,3])
    print(index)
    print(type(index))
    print(index.take([2]))

Output:

Int64Index([1, 2, 3], dtype='int64')
<class 'pandas.core.indexes.numeric.Int64Index'>
Int64Index([3], dtype='int64')

Comment From: jreback

Index is not trivially subclasses and needs a fair amount of hooks

welcome to have a fully working example but you would have to dive into the current implementation

not sure subclassing makes a whole lot of sense generally

Comment From: acturner

@jreback Would love an example of how to do this if possible!

Comment From: jreback

well someone would need to write the docs after understanding how it could work and of course out testing inplace for doing this arbitrarily

so welcome pull requests

Comment From: jorisvandenbossche

The xarray project actually has an Index subclass (see https://github.com/pydata/xarray/blob/23dc2fc9f2785c348ff821bf2da61dfa2206d283/xarray/coding/cftimeindex.py), which I think is actively being used

Now, the main issue that is shown above by the small examples is that you can't subclass Index without overriding the __new__ method. That's because the base class Index constructor will infer which Index subclass to use based on the passed data, so if you want to have your subclass type being returned, you need to implement __new__ yourself

Comment From: acturner

@jorisvandenbossche Thanks this makes sense!