Pandas Test long string formatting in to_string and to_html

Start with a data frame -- large-ish number of rows

df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
dfg = df.groupby(['a'])
dfg.aggregate(len) ## this gives right answer, every entry is of length 100
dfg.aggregate(lambda x: ', '.join(x.values)) ## this is wrong, the output string is somehow truncated?

It works fine if the string is not very long, eg start with df = DataFrame({'a':['a1']_10+['a2']_10, 'b':['b']*20)

Comment From: lodagro

This is not a bug, but a matter of how repr works, the values inside are as expected. pandas truncates columns in repr to 50 characters. This can be controlled with pandas.set_printoptions(), or you can force a print by using to_string()

>>> print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()

Comment From: wesm

Indeed, pandas tries to prevent flooding the screen in the console. I wonder if adding something like ... at the end of truncated strings would be useful to indicate truncation?

Comment From: httassadar

OK, yes, the value is there.

But df.to_string() does not retrieve it, it stops at the truncation -- I guess it calls repr and go from there?

Same for df.to_html()

Comment From: lodagro

Only __repr__ truncates, not to_string(), neither to_html.

In [16]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
                                                                                                                                                                                                                                                                                                             b
a                                                                                                                                                                                                                                                                                                             
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b

In [19]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()

	b
a
a1	b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
a2	b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b

Comment From: httassadar

Hmm, I'm on version 0.8.1 and it does truncate for to_string() and to_html()

But anyway, Lodagro's pandas.set_printoptions works, so problem solved :)

Thanks all.

Comment From: wesm

@lodagro I'm going to reopen this and change the title so we add a test on the to_string and to_html behavior

Comment From: lodagro

i assume you want to check that to_string() and to_html() never truncate, irrespective of the printoptions settings?

Comment From: wesm

I guess that makes the most sense

Comment From: jreback

@lodagro keep this open?

Comment From: lodagro

@jreback this issue moved away from the original question. Idea was to keep it open and make sure that to_string() and to_html() do not truncate. Just gave 0.12.0-651-gc8ab2dd a spin with the code above, both still truncate. If the idea of no truncation still stands, this issue can not be closed yet.

In [4]: from pandas import *

In [5]: cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
:dfg = df.groupby(['a'])
:dfg.aggregate(len) ## this gives right answer, every entry is of length 100
:dfg.aggregate(lambda x: ', '.join(x.values))
:--
Out[5]: 
                                                    b
a                                                    
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...

In [6]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
                                                    b
a                                                    
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...

In [7]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()                                                                                                                                                                                                      
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>b</th>
    </tr>
    <tr>
      <th>a</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>a1</th>
      <td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
    </tr>
    <tr>
      <th>a2</th>
      <td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
    </tr>
  </tbody>
</table>

In [8]:

Comment From: jreback

I think this will go toward solving this (maybe need some additional options), https://github.com/pydata/pandas/pull/5012 reopening and moving to 0.14:

thanks....always welcome a PR!

Comment From: jorisvandenbossche

Closing as this is working as expected (and a matter of the repr, as noted above)