Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
data = [0, 10.25, 11.11, 5.555, 50.55, 70.75, 98.99, 99.98, 99.99, 100.00]
df = pd.DataFrame(data)
f16_df = df.astype("float16").round(2)
f32_df = df.astype("float32").round(2)
f64_df = df.astype("float64").round(2)
f_df = df.astype(float).round(2)
f16t32_df = f16_df.astype("float32").round(2)
print(f"df:\n{df}\n")
print(f"f16:\n{f16_df}\n")
print(f"f32:\n{f32_df}\n")
print(f"f64:\n{f64_df}\n")
print(f"f:\n{f_df}\n")
print(f"f16t32:\n{f16t32_df}\n")
df:
0
0 0.000
1 10.250
2 11.110
3 5.555
4 50.550
5 70.750
6 98.990
7 99.980
8 99.990
9 100.000
f16:
0
0 0.000000
1 10.250000
2 11.109375
3 5.558594
4 50.562500
5 70.750000
6 99.062500
7 100.000000
8 100.000000
9 100.000000
f32:
0
0 0.000000
1 10.250000
2 11.110000
3 5.560000
4 50.549999
5 70.750000
6 98.989998
7 99.980003
8 99.989998
9 100.000000
f64:
0
0 0.00
1 10.25
2 11.11
3 5.56
4 50.55
5 70.75
6 98.99
7 99.98
8 99.99
9 100.00
f:
0
0 0.00
1 10.25
2 11.11
3 5.56
4 50.55
5 70.75
6 98.99
7 99.98
8 99.99
9 100.00
f16t32:
0
0 0.000000
1 10.250000
2 11.110000
3 5.560000
4 50.560001
5 70.750000
6 99.059998
7 100.000000
8 100.000000
9 100.000000
Issue Description
pd.DataFrame.round() does not seem to take effect when using NumPy floating point datatypes.
Notice that it only works for float64
, which I assume is because that's the default, making it akin to Python's default float
type (please correct my understanding if it is wrong).
What's worse is when this happens, boolean indexing does not behave as intended. This is how I came across this issue:
f16_range_df = f16_df[f16_df <= 99.99]
print(f"f16:\n{f16_df}\n")
print(f"f16_range_df:\n{f16_range_df}\n")
f16:
0
0 0.000000
1 10.250000
2 11.109375
3 5.558594
4 50.562500
5 70.750000
6 99.062500
7 100.000000
8 100.000000
9 100.000000
f16_range_df:
0
0 0.000000
1 10.250000
2 11.109375
3 5.558594
4 50.562500
5 70.750000
6 99.062500
7 100.000000
8 100.000000
9 100.000000
Oddly, this behavior is observed when checking the inequality against 99.99
, 99.98
, 99.97
, but starts behaving correctly at 99.96
.
Expected Behavior
I would expect everything to be rounded to 2 decimal places, the way the float64
(f64
), and float
(f
) datatypes are in the example above.
I would also expect that boolean indexing works within the range specified, according to what that dataframe shows as its values when printed.
Installed Versions
Comment From: phofl
this is basically a duplicate of #50125
If you do df.iloc[4, 0] you'll get the correct value, because it's not cast to python float. This is an I/O issue and not a numerical problem
Comment From: mcp292
Thank you for pointing me to that issue. It didn't come in because my search was limited to open.
If I understand it correctly though, that issue concludes that despite the values being printed not rounded, the underlying data points are rounded correctly.
That does not explain why the range inequality I show, includes 100
when checking <= 99.99
.
Comment From: phofl
Float16 is not supported, this shouldn’t work at all, only float32 and 64
Comment From: mcp292
Interesting. I did not find that in the docs. They don't mention float types at all in this list. Where can I find the information you are giving me?
Comment From: phofl
to_numeric for example
Comment From: mcp292
That's pretty obscure, wouldn't you agree?
Comment From: mcp292
I would expect that up front in the dtypes section.
Comment From: phofl
Prs to improve the docs are welcome
Comment From: mcp292
I would rather leave it to someone who can do a more thorough job than you or I.
It's apparent you are in a rush. Why not leave this issue to someone willing to take their time? It's obvious you did not read this issue nor https://github.com/pandas-dev/pandas/issues/50214 as I had to step through them both with you, after you closed them at a glance.
I do appreciate the quick responses, but I feel we both spent more time on these issues than we wanted as a result.
Comment From: phofl
You can either rephrase those issues or open a new one stating clearly what’s not documented and what to add. The issues you opened are not helpful for documentation improvements
Comment From: mcp292
That is true, and what I ultimately did, but the point is it took a lot more effort than necessary to reach that conclusion, and the cause was haste.