Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from io import StringIO
import pandas as pd
bug = """13-04-2023     16.2       16.7      16.8      3.9  4.5        6:25"""

dtypes = {
    'Datum': "string",
    'Temp_max': 'float32',
    'Temp_min': 'float32',
    'Temp_gem': 'float32',
    'Neerslag': 'float32',
    'Wind': 'float32',
    'Zonneschijn': "string",
}

data = pd.read_csv(
    StringIO(bug),
    names=dtypes.keys(),
    decimal=".",
    delimiter=r"\s+",
    dtype=dtypes,
)
display(data)

Returns :

        Datum   Temp_max   Temp_min   Temp_gem  Neerslag  Wind Zonneschijn
0  13-04-2023  16.200001  16.700001  16.799999       3.9   4.5        6:25

Issue Description

Why is 16.2 converted to 16.200001 ? or 16.7 to 16.700001.

Confirmed to work with: - Bugs : to "float32", "Float32" - is fine for "Float64', 'float64'

Is bugged whether:

  • we pass it under dtype argument to pd.read_csv
  • or use pd.DataFrame.astype

Expected Behavior

No additional decimals should be added outside 00's.

I noticed the same problem for other numbers in the range 16 to 17

Installed Versions

INSTALLED VERSIONS ------------------ commit : 478d340667831908b5b4bf09a2787a11a14560c9 python : 3.10.10.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : fr_FR.ISO8859-1 pandas : 2.0.0 numpy : 1.23.5 pytz : 2023.3 dateutil : 2.8.2 setuptools : 67.6.1 pip : 23.0.1 Cython : None pytest : 7.3.0 hypothesis : None sphinx : 6.1.3 blosc : None feather : None xlsxwriter : 3.0.9 lxml.etree : 4.9.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.12.0 pandas_datareader: None bs4 : 4.12.2 bottleneck : None brotli : fastparquet : None fsspec : 2023.4.0 gcsfs : None matplotlib : 3.7.1 numba : 0.56.4 numexpr : None odfpy : None openpyxl : 3.1.0 pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : 1.0.10 s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : 2.0.9 tables : None tabulate : 0.9.0 xarray : 2022.11.0 xlrd : 2.0.1 zstandard : 0.19.0 tzdata : 2023.3 qtpy : 2.3.1 pyqt5 : None

Comment From: atharvaagate

take

Comment From: ljmc-github

@MCRE-BE this looks like a generic floating point math arithmetic problem.

See https://0.30000000000000004.com/

Comment From: MCRE-BE

Hmmm indeed, never noticed it before. I don't know if the maintainers want/can fix it, but it's confusing anyway. 🫣

Comment From: rhshadrach

Agreed @ljmc-github - for numbers in [16, 32), float32 has a precision of 2^-19, which is greater than 0.000001. If you want more precision you need to use float64.