Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


data = [0, 10.25, 11.11, 5.555, 50.55, 70.75, 98.99, 99.98, 99.99, 100.00]

df = pd.DataFrame(data)
f16_df = df.astype("float16").round(2)
f32_df = df.astype("float32").round(2)
f64_df = df.astype("float64").round(2)
f_df = df.astype(float).round(2)
f16t32_df = f16_df.astype("float32").round(2)

print(f"df:\n{df}\n")
print(f"f16:\n{f16_df}\n")
print(f"f32:\n{f32_df}\n")
print(f"f64:\n{f64_df}\n")
print(f"f:\n{f_df}\n")
print(f"f16t32:\n{f16t32_df}\n")
df:
         0
0    0.000
1   10.250
2   11.110
3    5.555
4   50.550
5   70.750
6   98.990
7   99.980
8   99.990
9  100.000

f16:
            0
0    0.000000
1   10.250000
2   11.109375
3    5.558594
4   50.562500
5   70.750000
6   99.062500
7  100.000000
8  100.000000
9  100.000000

f32:
            0
0    0.000000
1   10.250000
2   11.110000
3    5.560000
4   50.549999
5   70.750000
6   98.989998
7   99.980003
8   99.989998
9  100.000000

f64:
        0
0    0.00
1   10.25
2   11.11
3    5.56
4   50.55
5   70.75
6   98.99
7   99.98
8   99.99
9  100.00

f:
        0
0    0.00
1   10.25
2   11.11
3    5.56
4   50.55
5   70.75
6   98.99
7   99.98
8   99.99
9  100.00

f16t32:
            0
0    0.000000
1   10.250000
2   11.110000
3    5.560000
4   50.560001
5   70.750000
6   99.059998
7  100.000000
8  100.000000
9  100.000000

Issue Description

pd.DataFrame.round() does not seem to take effect when using NumPy floating point datatypes.

Notice that it only works for float64, which I assume is because that's the default, making it akin to Python's default float type (please correct my understanding if it is wrong).

What's worse is when this happens, boolean indexing does not behave as intended. This is how I came across this issue:

f16_range_df = f16_df[f16_df <= 99.99]

print(f"f16:\n{f16_df}\n")
print(f"f16_range_df:\n{f16_range_df}\n")
f16:
            0
0    0.000000
1   10.250000
2   11.109375
3    5.558594
4   50.562500
5   70.750000
6   99.062500
7  100.000000
8  100.000000
9  100.000000

f16_range_df:
            0
0    0.000000
1   10.250000
2   11.109375
3    5.558594
4   50.562500
5   70.750000
6   99.062500
7  100.000000
8  100.000000
9  100.000000

Oddly, this behavior is observed when checking the inequality against 99.99, 99.98, 99.97, but starts behaving correctly at 99.96.

Expected Behavior

I would expect everything to be rounded to 2 decimal places, the way the float64 (f64), and float (f) datatypes are in the example above.

I would also expect that boolean indexing works within the range specified, according to what that dataframe shows as its values when printed.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 8dab54d6573f7186ff0c3b6364d5e4dd635ff3e7 python : 3.10.8.final.0 python-bits : 64 OS : Linux OS-release : 6.0.12-arch1-1 Version : #1 SMP PREEMPT_DYNAMIC Thu, 08 Dec 2022 11:03:38 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.2 numpy : 1.23.5 pytz : 2021.3 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.3.1 Cython : 0.29.32 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.4.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.2 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None None

Comment From: phofl

this is basically a duplicate of #50125

If you do df.iloc[4, 0] you'll get the correct value, because it's not cast to python float. This is an I/O issue and not a numerical problem

Comment From: mcp292

Thank you for pointing me to that issue. It didn't come in because my search was limited to open.

If I understand it correctly though, that issue concludes that despite the values being printed not rounded, the underlying data points are rounded correctly.

That does not explain why the range inequality I show, includes 100 when checking <= 99.99.

Comment From: phofl

Float16 is not supported, this shouldn’t work at all, only float32 and 64

Comment From: mcp292

Interesting. I did not find that in the docs. They don't mention float types at all in this list. Where can I find the information you are giving me?

Comment From: phofl

to_numeric for example

Comment From: mcp292

That's pretty obscure, wouldn't you agree?

Comment From: mcp292

I would expect that up front in the dtypes section.

Comment From: phofl

Prs to improve the docs are welcome

Comment From: mcp292

I would rather leave it to someone who can do a more thorough job than you or I.

It's apparent you are in a rush. Why not leave this issue to someone willing to take their time? It's obvious you did not read this issue nor https://github.com/pandas-dev/pandas/issues/50214 as I had to step through them both with you, after you closed them at a glance.

I do appreciate the quick responses, but I feel we both spent more time on these issues than we wanted as a result.

Comment From: phofl

You can either rephrase those issues or open a new one stating clearly what’s not documented and what to add. The issues you opened are not helpful for documentation improvements

Comment From: mcp292

That is true, and what I ultimately did, but the point is it took a lot more effort than necessary to reach that conclusion, and the cause was haste.