Pandas Default value of df.set_index()'s argument 'inplace' should be True.

Code demonstrating my issue with the 'inplace' argument of df.set_index().


import pandas as pd

#start from a 2-dim list
alist = [[1,2], [4,5]]

#create a dataframe from the list
df = pd.DataFrame(alist, columns=['time', 'temp'])
print(df.head())

#change the index of the dataframe to be the time column
df.set_index('time')

#print the "changed" dataframe
print(df.head())

#....but it's the same

Problem description

df.set_index() has an 'inplace' argument, with a default value of 'False'. In the above example, the change made to your dataframe is thrown away. This choice of convention is at the root of many pandas beginners' problems. When you call .set_index(), one's mental approach is to modify your existing dataframe, not generate a new one. Use of this function is outrageously counterintuitive.

Expected Output

' temp time
1 2 4 5 '

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 17.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.21.1 pytest: None pip: 9.0.1 setuptools: 38.2.4 Cython: None numpy: 1.13.3 scipy: None pyarrow: None xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.1.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 1.0b10 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: TomAugspurger

inplace=False is the default in every other pandas method I believe, and this would be a large API break.

Personally I don't find it unintuitive, though I'm curious to hear why you say it is. Did something in the documentation lead you astray?

Comment From: jreback

yep this is counter to pandas philosphy.

Comment From: fhabermacher

inplace=False is the default in every other pandas method I believe, and this would be a large API break.

Personally I don't find it unintuitive, though I'm curious to hear why you say it is. Did something in the documentation lead you astray?

It is not true that every pandas method does not do changes in an inplace way. At least pandas.DataFrame.insert changes the calling df.

One can debate about intuitions, but it would probably seem at least as intuitive for a 'set' method such as set_index() to be inplace as an insert() method.

Pandas Default value of df.set_index()'s argument 'inplace' should be True.

Code demonstrating my issue with the 'inplace' argument of df.set_index().

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`