Something like this is a pretty good example. I know something about animals, cats, dogs, and hair, so I can sort of keep the structure of the data in my head and follow along with the transformations without having to scroll up and down to check against the original dataframe.
But a lot of examples just use meaningless column names like A, B, C, D...
or foo, bar, baz...
, which makes it a lot harder to gain an intuition about what's going on. For example, if you don't know what groupby
does, this example:
In [13]: df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A'], 'Y' : [1, 2, 3, 4]})
In [14]: df2.groupby(['X']).sum()
Out[14]:
Y
X
A 7
B 3
might be less useful than this version:
In [13]: pets = pd.DataFrame({'animal' : ['dog', 'dog', 'cat', 'cat'], 'weight' : [10, 20, 8, 9]})
In [14]: pets.groupby(['weight']).mean()
Out[14]:
weight
animal
dog 15
cat 8.5
I realize re-doing all the examples like this would be a significant amount of work, but if there's agreement that this is a desirable thing, I'd be happy to kick things off with a small P.R.
Also, I think it would be good to add this as a guideline to the documentation section of the contributing doc. (Again, if people agree this is worthwhile and not misguided.)
Comment From: jreback
I am -0 on this. I don't think this adds much value, would be a lot of works to change until its consistent, and the current examples are a bit shorter.
Comment From: TomAugspurger
https://github.com/pandas-dev/pandas/pull/16520 is starting on something like this.
Comment From: jorisvandenbossche
Closing this in favor of https://github.com/pandas-dev/pandas/issues/19710. @colinmorris I think this is a good idea for certain cases like the groupby example above (not for all docstrings), feel free to comment on https://github.com/pandas-dev/pandas/issues/19710.