Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas
df = pandas.DataFrame(data=[[1234567890123456789]], columns=['test'])
df.style.to_html()
df.style.to_latex()
Issue Description
If the DataFrame contains integers with more digits than can be represented by floating point double precision, The default Styler (DataFrame.style) will display the wrong value:
html:
<style type="text/css">\n</style>\n<table id="T_2d9c5">\n <thead>\n <tr>\n <th class="blank level0" > </th>\n <th id="T_2d9c5_level0_col0" class="col_heading level0 col0" >test</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th id="T_2d9c5_level0_row0" class="row_heading level0 row0" >0</th>\n <td id="T_2d9c5_row0_col0" class="data row0 col0" >1234567890123456768</td>\n </tr>\n </tbody>\n</table>\n
latex:
\\begin{tabular}{lr}\n & test \\\\\n0 & 12345678901234567168 \\\\\n\\end{tabular}\n
In both cases, the displayed value is 12345678901234567168 instead of 1234567890123456789. I suspect that there is an internal lossy conversion to floating point.
It does not seem to matter, whether the dtype is int64, uint64, Int64, UInt64, or object
Expected Behavior
The integers should be displayed correctly. In pandas 1.1.4, this does work as expected:
<style type="text/css" >\n</style><table id="T_100b2b29_ce31_11ed_9d71_00505692da1d" ><thead> <tr> <th class="blank level0" ></th> <th class="col_heading level0 col0" >test</th> </tr></thead><tbody>\n <tr>\n <th id="T_100b2b29_ce31_11ed_9d71_00505692da1dlevel0_row0" class="row_heading level0 row0" >0</th>\n <td id="T_100b2b29_ce31_11ed_9d71_00505692da1drow0_col0" class="data row0 col0" >1234567890123456789</td>\n </tr>\n </tbody></table>
Installed Versions
tested versions (broken): pandas 1.5.3 pandas 2.1.0.dev0+337.g8c7b8a4f3e
tested versions (fine): pandas 1.1.4
Comment From: hxy450
take
Comment From: ZackOudeif
Take
Comment From: attack68
The suspicion of lossy conversion to float is correct.
The current _default_formatter
of pandas\io\formats\style_render.py (approx line 1720) has the following:
elif is_integer(x):
return f"{x:,.0f}" if thousands else f"{x:.0f}"
This is therefore a python f-string conversion doing the work. A fix would be to change to:
elif is_integer(x):
return f"{x:,.0f}" if thousands else str(x)
The str
function does not lose the precison.
Comment From: attack68
(And an even better fix would be to correct the case for thousands separator without lossy precision - but this is much more complciated and a separate PR for that would be recommended)
Comment From: asishm-wk
(And an even better fix would be to correct the case for thousands separator without lossy precision - but this is much more complciated and a separate PR for that would be recommended)
In [18]: a = 1234567890123456789
In [19]: f"{a:,}"
Out[19]: '1,234,567,890,123,456,789'
Wouldn't this work?
Comment From: attack68
Apparently so.
:-) Just testing.
Comment From: lusolorz
take