Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
data = [0, 10.25, 11.11, 5.555, 50.55, 70.75, 98.99, 99.98, 99.99, 100.00]
df = pd.DataFrame(data)
f16_df = df.astype("float16").round(2)
f32_df = df.astype("float32").round(2)
f64_df = df.astype("float64").round(2)
f_df = df.astype(float).round(2)
f16t32_df = f16_df.astype("float32").round(2)
print(f"df:\n{df}\n")
print(f"f16:\n{f16_df}\n")
print(f"f32:\n{f32_df}\n")
print(f"f64:\n{f64_df}\n")
print(f"f:\n{f_df}\n")
print(f"f16t32:\n{f16t32_df}\n")
df:
0
0 0.000
1 10.250
2 11.110
3 5.555
4 50.550
5 70.750
6 98.990
7 99.980
8 99.990
9 100.000
f16:
0
0 0.000000
1 10.250000
2 11.109375
3 5.558594
4 50.562500
5 70.750000
6 99.062500
7 100.000000
8 100.000000
9 100.000000
f32:
0
0 0.000000
1 10.250000
2 11.110000
3 5.560000
4 50.549999
5 70.750000
6 98.989998
7 99.980003
8 99.989998
9 100.000000
f64:
0
0 0.00
1 10.25
2 11.11
3 5.56
4 50.55
5 70.75
6 98.99
7 99.98
8 99.99
9 100.00
f:
0
0 0.00
1 10.25
2 11.11
3 5.56
4 50.55
5 70.75
6 98.99
7 99.98
8 99.99
9 100.00
f16t32:
0
0 0.000000
1 10.250000
2 11.110000
3 5.560000
4 50.560001
5 70.750000
6 99.059998
7 100.000000
8 100.000000
9 100.000000
Issue Description
Downcasting neither warned against nor discussed in the documentation. Upcasting has a section under dytpes, and downcasting is mentioned only within the scope of pd.to_numeric()
.
Further, I'm not sure if the behavior is undefined because for float16
the mantissa is 10 bits which means it should max out at 1023, yet 10.25 (1025e-2) and up are preserved.
pd.DataFrame.round() does not seem to take effect when using NumPy datatypes.
Expected Behavior
- Receive warning.
- Truncate values.
- Discussed in documentation.
Installed Versions
Comment From: mcp292
This is not discussed in the documentation, yet the issue is closed.
Comment From: phofl
PRs are welcome I guess. Not sure where to add this though. It should be more or less clear that we return python dtypes based on the examples in the docs
Comment From: mcp292
In the dtypes section (linked above), they use float32
and int8
in the first example. Are you suggesting this is not good practice nor supported?
Comment From: mcp292
Related to https://github.com/pandas-dev/pandas/issues/50213.
Comment From: phofl
This is perfectly fine, but they get converted to python floats instead of np.float32() objects when writing them out, like printing or csv.
Comment From: mcp292
Okay, I'll take it.
This is still unanswered though:
Further, I'm not sure if the behavior is undefined because for float16 the mantissa is 10 bits which means it should max out at 1023, yet 10.25 (1025e-2) and up are preserved.
Comment From: phofl
Please comment only in one issue
Comment From: mcp292
I just realized I didn't name this issue. These issues are separate, but use the same example.
Comment From: phofl
Not sure what you mean with downcasting, but you can cast whatever you want, we don’t warn about loss of data right now
Comment From: mcp292
Opposite of upcasting.
There would only need to be a warning if risking data loss, which you say is not implemented.
This is fine to not implement, so long as it is mentioned in the docs. I feel a downcasting section could be added to the link in this comment.