Start with a data frame -- large-ish number of rows
df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
dfg = df.groupby(['a'])
dfg.aggregate(len) ## this gives right answer, every entry is of length 100
dfg.aggregate(lambda x: ', '.join(x.values)) ## this is wrong, the output string is somehow truncated?
It works fine if the string is not very long, eg start with df = DataFrame({'a':['a1']_10+['a2']_10, 'b':['b']*20)
Comment From: lodagro
This is not a bug, but a matter of how repr works, the values inside are as expected. pandas truncates columns in repr to 50 characters. This can be controlled with pandas.set_printoptions()
, or you can force a print by using to_string()
>>> print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
Comment From: wesm
Indeed, pandas tries to prevent flooding the screen in the console. I wonder if adding something like ...
at the end of truncated strings would be useful to indicate truncation?
Comment From: httassadar
OK, yes, the value is there.
But df.to_string() does not retrieve it, it stops at the truncation -- I guess it calls repr and go from there?
Same for df.to_html()
Comment From: lodagro
Only __repr__
truncates, not to_string()
, neither to_html
.
In [16]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
b
a
a1 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
a2 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
In [19]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()
b | |
---|---|
a | |
a1 | b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b |
a2 | b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b |
Comment From: httassadar
Hmm, I'm on version 0.8.1 and it does truncate for to_string() and to_html()
But anyway, Lodagro's pandas.set_printoptions works, so problem solved :)
Thanks all.
Comment From: wesm
@lodagro I'm going to reopen this and change the title so we add a test on the to_string and to_html behavior
Comment From: lodagro
i assume you want to check that to_string()
and to_html()
never truncate, irrespective of the printoptions settings?
Comment From: wesm
I guess that makes the most sense
Comment From: jreback
@lodagro keep this open?
Comment From: lodagro
@jreback this issue moved away from the original question. Idea was to keep it open and make sure that to_string()
and to_html()
do not truncate. Just gave 0.12.0-651-gc8ab2dd
a spin with the code above, both still truncate. If the idea of no truncation still stands, this issue can not be closed yet.
In [4]: from pandas import *
In [5]: cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
:dfg = df.groupby(['a'])
:dfg.aggregate(len) ## this gives right answer, every entry is of length 100
:dfg.aggregate(lambda x: ', '.join(x.values))
:--
Out[5]:
b
a
a1 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
In [6]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
b
a
a1 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
In [7]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>b</th>
</tr>
<tr>
<th>a</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>a1</th>
<td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
</tr>
<tr>
<th>a2</th>
<td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
</tr>
</tbody>
</table>
In [8]:
Comment From: jreback
I think this will go toward solving this (maybe need some additional options), https://github.com/pydata/pandas/pull/5012 reopening and moving to 0.14:
thanks....always welcome a PR!
Comment From: jorisvandenbossche
Closing as this is working as expected (and a matter of the repr, as noted above)