From the mailing list: https://groups.google.com/forum/?fromgroups#!topic/pydata/fpYfwkZvdxI
In [37]: tips = pd.read_csv('https://raw.github.com/pydata/pandas/master/pandas/tests/data/tips.csv')
In [38]: pd.crosstab(tips.day, tips.size)
Out[38]:
col_0 1708
day
Fri 19
Sat 87
Sun 76
Thur 62
In [39]: pd.__version__
Out[39]: '0.16.2'
while it should be:
In [3]: pd.crosstab(tips.day, tips.size)
Out[3]:
size 1 2 3 4 5 6
day
Fri 1 16 1 1 0 0
Sat 2 53 18 13 1 0
Sun 0 39 15 18 3 1
Thur 1 48 4 5 1 3
In [4]: pd.__version__
Out[4]: '0.14.1'
But strange thing is, I can't reproduce it with simple example:
In [43]: df = pd.DataFrame({'a':list('abcabcabcab'), 'b':[1,1,1,1,2,3,2,2,3,3,1]
})
In [44]: pd.crosstab(df['a'], df['b'])
Out[44]:
b 1 2 3
a
a 2 1 1
b 2 2 0
c 1 0 2
Comment From: rosnfeld
I think this is the problem:
In [22]: tips.size
Out[22]: 1708
tips.size is not accessing the 'size' column but rather the length of the tips DataFrame.
Comment From: jorisvandenbossche
Indeed!