Pandas version checks
- [x] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html
Documentation problem
set_index(), when applied to a DataFrame which already has a data column (non-default) assigned as index, will delete this data column from the DataFrame when assigning another data column to be the index.
While I find this behaviour inappropriate, I understand that reset_index() should be used before set_index(), in which case the original index column may be preserved.
The problem is that the documentation for set_index() does not mention this at all, so the user is left to discover the problem and then the way to avoid it.
Suggested fix for documentation
Add a comment in the set_index documentation to clarify that setting a data column as index, when there is already a different data column serving as index, will delete that data column, unless reset_index is performed first.
Comment From: SaraInCode
take
Comment From: rhshadrach
Thanks for the report!
The problem is that the documentation for set_index() does not mention this at all, so the user is left to discover the problem and then the way to avoid it.
The documentation states:
The index can replace the existing index or expand on it.
Doesn't this count as a mention?