A small, complete example of the issue
ix = pd.Index([0,0,1,2])
sr = pd.Series([0,0,1,2])
ix.unique()
Int64Index([0, 1, 2], dtype='int64')
sr.unique()
array([0, 1, 2], dtype=int64)
Expected Output
I'd expect that unique returns the original datatype or always an array. I'd prefer the original data_type since arrays do not provide a to_series function but indexes do. So when working with pipelines, this is much more convenient
Output of pd.show_versions()
Comment From: jorisvandenbossche
This has changed in 0.19 after some discussion. See the notice in the whatsnew here: http://pandas.pydata.org/pandas-docs/version/0.19.0/whatsnew.html#index-unique-consistently-returns-index and the discussion here: https://github.com/pandas-dev/pandas/issues/13395
Basically, before 0.19 Index.unique() returned either an index or an array depending on the data type. Therefore we choose to always return an Index for consistency. The problem with the return type of Series.unique(), is that a Series has a default index that is meaningless for the unique values. Therfore we choose to stick to the return type of array in this case.
Those choices indeed cause an inconsistency between Index and Series.
Comment From: datajanko
oh, I overrad this on the release notes and didn't find it in the github issues. Shame on me. Thanks for the clarification
Comment From: jorisvandenbossche
@Jan-Ko No problem!