I try do boxplot my values. Pandas is able to correctly describe the DataFrame but as soon as execute a boxplot it sticks to the major ticks (in this case only int values).
How can I prevent this behaviour and get the real values boxplotted?
x_infos = [[2, 4, 4, 4, 2, 2, 4, 4, 3, 4, 1, 1, 4, 2, 4, 3, 1],
[4, 5, 2, 1, 1, 1, 4, 3, 5, 3, 1, 1, 2, 2, 4, 4, 4],
[2, 5, 5, 5, 5, 4, 4, 5, 4, 5, 5, 2, 4, 5, 5, 1, 5],
[5, 4, 3, 5, 4, 5, 5, 5, 4, 4, 5, 3, 4, 5, 5, 3, 5],
[5, 5, 5, 3, 4, 4, 5, 4, 5, 1, 1, 3, 3, 4, 4, 4, 2],
[4, 5, 3, 4, 4, 5, 3, 4, 5, 1, 1, 1, 3, 5, 5, 5, 5],
[1, 2, 1, 1, 2, 2, 2, 1, 2, 4, 1, 1, 1, 1, 1, 1, 1]]
df_infos = pd.DataFrame(x_infos)
df_infos.T.boxplot(vert=False, grid=True)
plt.show()
Results:
As you can see here mean
and std
are float values. Why does pandas sticks them to major ticks as soon as it comes to boxplot?
>>> df_infos.T.describe()
0 1 2 3 4 5 \
count 17.000000 17.000000 17.000000 17.000000 17.000000 17.000000
mean 1.470588 3.705882 3.647059 4.352941 4.176471 2.764706
std 0.799816 1.490164 1.320094 0.785905 1.286239 1.480262
min 1.000000 1.000000 1.000000 3.000000 1.000000 1.000000
25% 1.000000 3.000000 3.000000 4.000000 4.000000 1.000000
50% 1.000000 4.000000 4.000000 5.000000 5.000000 3.000000
75% 2.000000 5.000000 5.000000 5.000000 5.000000 4.000000
max 4.000000 5.000000 5.000000 5.000000 5.000000 5.000000
6
count 17.000000
mean 2.882353
std 1.218726
min 1.000000
25% 2.000000
50% 3.000000
75% 4.000000
max 4.000000
Comment From: TomAugspurger
Maybe take a look at the matplotlib docs: http://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.boxplot.html
boxplot doesn't plot the mean or standard deviation. It plots the median (red) and IQR (so 75 and 25 percentiles) in blue.