Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish I could use Pandas to easily convert numbers formatted in the Brazilian style (1.234,56) into numeric types.

Currently, pd.to_numeric() does not support this format, and users have to manually apply .str.replace(".", "").replace(",", "."), which is not intuitive.

This feature would simplify data handling for users in Brazil and other countries with similar numerical formats.

Feature Description

Add a new function to_numeric_br() to automatically convert strings with the Brazilian numeric format into floats.

Proposed Implementation (Pseudocode)

def to_numeric_br(series, errors="raise"):
    """
    Converts Brazilian-style numeric strings (1.234,56) into float.

    Parameters:
    ----------
    series : pandas.Series
        Data to be converted.
    errors : str, default 'raise'
        - 'raise' : Throws an error for invalid values.
        - 'coerce' : Converts invalid values to NaN.
        - 'ignore' : Returns the original data in case of error.

    Returns:
    -------
    pandas.Series with numeric values.
    """

Expected Behavior

import pandas as pd

df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50"]})
df["converted_values"] = to_numeric_br(df["values"], errors="coerce")

print(df)

Expected Output:

      values  converted_values
0  1.234,56          1234.56
1  5.600,75          5600.75
2    100,50           100.50

Alternatively, instead of a standalone function, this could be implemented as an enhancement to pd.to_numeric(), adding a locale="br" parameter.

Alternative Solutions

Currently, users must manually apply string replacements before using pd.to_numeric(), like this:

df["values"] = df["values"].str.replace(".", "", regex=True).str.replace(",", ".", regex=True)
df["values"] = pd.to_numeric(df["values"], errors="coerce")

While this works, it is not user-friendly, especially for beginners.

Another alternative is using third-party packages like babel, but this requires additional dependencies and is not built into Pandas.

Additional Context

  • Similar requests have been made by users handling locale-specific number formats.
  • Would the maintainers prefer a standalone function (to_numeric_br()) or a locale parameter in pd.to_numeric()?
  • Happy to implement this if maintainers approve!

Comment From: Liam3851

See also #4674, and #56934 which would have added this support.

Comment From: itayg2341

I've encountered a similar need to handle Brazilian number formatting and created a function that might be helpful. It addresses the different errors parameter options as well.

```python import pandas as pd import numpy as np

def to_numeric_br(series, errors="raise"): """ Converts Brazilian-style numeric strings (1.234,56) into float.

Parameters:
----------
series : pandas.Series
    Data to be converted.
errors : str, default 'raise'
    - 'raise' : Throws an error for invalid values.
    - 'coerce' : Converts invalid values to NaN.
    - 'ignore' : Returns the original data in case of error.

Returns:
-------
pandas.Series with numeric values.
"""

def converter(x):
    if pd.isna(x):
        return x
    try:
        return float(x.replace(".", "").replace(",", "."))
    except ValueError:
        if errors == "raise":
            raise
        elif errors == "coerce":
            return np.nan
        elif errors == "ignore":
            return x
        else:
            raise ValueError("Invalid error value")

return series.apply(converter)

Example usage:

df = pd.DataFrame({"values": ["1.234,56", "5.600,75", "100,50", "invalid"]})

df["converted_coerce"] = to_numeric_br(df["values"], errors="coerce") df["converted_ignore"] = to_numeric_br(df["values"], errors="ignore")

print(df)

try: df["converted_raise"] = to_numeric_br(df["values"], errors="raise") except ValueError as e: print(f"Caught exception as expected: {e}")

This function handles NaN values gracefully and provides flexibility in how errors are managed. While integrating this directly into pd.to_numeric with a locale option would be ideal, this standalone function could be a useful workaround in the meantime. I hope this contributes to the discussion!

Comment From: Veras-D

Thanks for pointing that out, @Liam3851 If #4674 and #56934 already added support for specifying decimal and a thousand separators in pd.to_numeric(), then my proposal might be redundant.

Could you confirm if this feature is already fully implemented in the latest Pandas release? If so, users in Brazil could simply use pd.to_numeric(..., decimal=',', thousands='.') instead of a separate function.

If there are any remaining gaps, I'd be happy to adjust my proposal accordingly.

Comment From: Veras-D

Thank you, @itayg2341 looks a great solution.

Comment From: Liam3851

@Veras-D I'd suggest you could re-open this, as I don't believe #56934 was ever merged. cc: @mroeschke

Comment From: Veras-D

Thanks for the clarification @Liam3851! I've reopened the issue. Let me know if there's anything I can do to help move this forward.