Code demonstrating my issue with the 'inplace' argument of df.set_index().
import pandas as pd
#start from a 2-dim list
alist = [[1,2], [4,5]]
#create a dataframe from the list
df = pd.DataFrame(alist, columns=['time', 'temp'])
print(df.head())
#change the index of the dataframe to be the time column
df.set_index('time')
#print the "changed" dataframe
print(df.head())
#....but it's the same
Problem description
df.set_index() has an 'inplace' argument, with a default value of 'False'. In the above example, the change made to your dataframe is thrown away. This choice of convention is at the root of many pandas beginners' problems. When you call .set_index(), one's mental approach is to modify your existing dataframe, not generate a new one. Use of this function is outrageously counterintuitive.
Expected Output
' temp
time
1 2
4 5
'
Output of pd.show_versions()
Comment From: TomAugspurger
inplace=False
is the default in every other pandas method I believe, and this would be a large API break.
Personally I don't find it unintuitive, though I'm curious to hear why you say it is. Did something in the documentation lead you astray?
Comment From: jreback
yep this is counter to pandas philosphy.
Comment From: fhabermacher
inplace=False
is the default in every other pandas method I believe, and this would be a large API break.Personally I don't find it unintuitive, though I'm curious to hear why you say it is. Did something in the documentation lead you astray?
It is not true that every pandas method does not do changes in an inplace way. At least pandas.DataFrame.insert changes the calling df.
One can debate about intuitions, but it would probably seem at least as intuitive for a 'set' method such as set_index()
to be inplace as an insert()
method.