Feature Type
-
[x] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I wish I could use Pandas to easily convert numbers formatted in the Brazilian style (1.234,56
) into numeric types.
Currently, pd.to_numeric()
does not support this format, and users have to manually apply .str.replace(".", "").replace(",", ".")
, which is not intuitive.
This feature would simplify data handling for users in Brazil and other countries with similar numerical formats.
Feature Description
Add a new function to_numeric_br() to automatically convert strings with the Brazilian numeric format into floats.
Proposed Implementation (Pseudocode)
def to_numeric_br(series, errors="raise"):
"""
Converts Brazilian-style numeric strings (1.234,56) into float.
Parameters:
----------
series : pandas.Series
Data to be converted.
errors : str, default 'raise'
- 'raise' : Throws an error for invalid values.
- 'coerce' : Converts invalid values to NaN.
- 'ignore' : Returns the original data in case of error.
Returns:
-------
pandas.Series with numeric values.
"""
Expected Behavior
import pandas as pd
df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50"]})
df["converted_values"] = to_numeric_br(df["values"], errors="coerce")
print(df)
Expected Output:
values converted_values
0 1.234,56 1234.56
1 5.600,75 5600.75
2 100,50 100.50
Alternatively, instead of a standalone function, this could be implemented as an enhancement to pd.to_numeric()
, adding a locale="br"
parameter.
Alternative Solutions
Currently, users must manually apply string replacements before using pd.to_numeric()
, like this:
df["values"] = df["values"].str.replace(".", "", regex=True).str.replace(",", ".", regex=True)
df["values"] = pd.to_numeric(df["values"], errors="coerce")
While this works, it is not user-friendly, especially for beginners.
Another alternative is using third-party packages like babel, but this requires additional dependencies and is not built into Pandas.
Additional Context
- Similar requests have been made by users handling locale-specific number formats.
- Would the maintainers prefer a standalone function (
to_numeric_br()
) or alocale
parameter inpd.to_numeric()
? - Happy to implement this if maintainers approve!
Comment From: Liam3851
See also #4674, and #56934 which would have added this support.
Comment From: itayg2341
I've encountered a similar need to handle Brazilian number formatting and created a function that might be helpful. It addresses the different errors
parameter options as well.
```python import pandas as pd import numpy as np
def to_numeric_br(series, errors="raise"): """ Converts Brazilian-style numeric strings (1.234,56) into float.
Parameters:
----------
series : pandas.Series
Data to be converted.
errors : str, default 'raise'
- 'raise' : Throws an error for invalid values.
- 'coerce' : Converts invalid values to NaN.
- 'ignore' : Returns the original data in case of error.
Returns:
-------
pandas.Series with numeric values.
"""
def converter(x):
if pd.isna(x):
return x
try:
return float(x.replace(".", "").replace(",", "."))
except ValueError:
if errors == "raise":
raise
elif errors == "coerce":
return np.nan
elif errors == "ignore":
return x
else:
raise ValueError("Invalid error value")
return series.apply(converter)
Example usage:
df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50", "invalid"]})
df["converted_coerce"] = to_numeric_br(df["values"], errors="coerce") df["converted_ignore"] = to_numeric_br(df["values"], errors="ignore")
print(df)
try: df["converted_raise"] = to_numeric_br(df["values"], errors="raise") except ValueError as e: print(f"Caught exception as expected: {e}")
This function handles NaN values gracefully and provides flexibility in how errors are managed. While integrating this directly into pd.to_numeric with a locale option would be ideal, this standalone function could be a useful workaround in the meantime. I hope this contributes to the discussion!
Comment From: Veras-D
Thanks for pointing that out, @Liam3851 If #4674 and #56934 already added support for specifying decimal and a thousand separators in pd.to_numeric()
, then my proposal might be redundant.
Could you confirm if this feature is already fully implemented in the latest Pandas release? If so, users in Brazil could simply use pd.to_numeric(..., decimal=',', thousands='.')
instead of a separate function.
If there are any remaining gaps, I'd be happy to adjust my proposal accordingly.
Comment From: Veras-D
Thank you, @itayg2341 looks a great solution.
Comment From: Liam3851
@Veras-D I'd suggest you could re-open this, as I don't believe #56934 was ever merged. cc: @mroeschke
Comment From: Veras-D
Thanks for the clarification @Liam3851! I've reopened the issue. Let me know if there's anything I can do to help move this forward.