Pandas version checks

  • [X] I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_string.html

Documentation problem

DataFrame.to_string docstring says float_format should be "one-parameter function, optional, default None", but passing a format string (e.g. '%.3e') works too, like in .to_csv, for NumPy float type columns/series.

However, unlike .to_csv, .to_string with float_format string fails with TypeError: 'str' object is not callable for float extension type columns/series.

s = pd.Series([1.1, 2.2, 3.3])

# works
s.to_csv(float_format="%.3e")
s.convert_dtypes().to_csv(float_format="%.3e")
s.to_string(float_format="%.3e")

# fails
s.convert_dtypes().to_string(float_format="%.3e")

xref: #9448 https://github.com/pandas-dev/pandas-stubs/issues/730

Suggested fix for documentation

It would be nice for the documentation to reflect that a format string can be used, but probably the issue with extension types should be fixed first.

Note also that FloatFormatType, used in the internal typing of DataFrame.to_string,

https://github.com/pandas-dev/pandas/blob/abcd440e11d061171b0a4eeb058748629a288568/pandas/core/frame.py#L1149

does include str as an option.

https://github.com/pandas-dev/pandas/blob/abcd440e11d061171b0a4eeb058748629a288568/pandas/_typing.py#L303

Comment From: topper-123

This does accept formatting string, so the docs could need some updating here and the type hint for Series.to_string float format should be the same as in DataFrame.to_string.

This failure you're showing looks like a bug, e.g. if we do s.convert_dtypes().to_string(float_format="xxx") we get the same error, while it should give ValueError: Invalid format specifier or similar.

A PR for both doc update and bug are welcome.

Comment From: rsm-23

@topper-123 for the bug, are we trying to add this functionality for Series.to_string as well?

Comment From: topper-123

Yes, Series.to_string should accept the same input for float_format as DataFrame.to_string. It may already do that, in which case it's just a question of updating the type hint in Series.to_string.

Comment From: otitoU

take

Comment From: rsm-23

hey @otitoU how is it going? Would you mind if I take a look as well?

Comment From: otitoU

Hey @rsm-23 . it is going good! Sorry, i have already spent some time on it, I hope you don't mind I would like to finish this.

Comment From: otitoU

Hey @phofl just to give you a bit of context.

what is the cause of the bug

So the Error stack shows the problem arises at file pandas/core/series.py line 1765 when the “result = formatter.to_string()” is called. formatter is an instance of the SeriesFormatter Class from the fmt/format.py internal module. There are different types of formatter used to format the specific array depending on the values.dtype of the values array like FloatArrayFormatterspecific , GenericArrayFormatter etc all can be found in the format.py file/fmt module. The specific formatter that the formatter.to_string() of the series.py file uses is the GenericArrayFormatter In the GenericArrayFormatter it iterates over vals which is an array of the values to be formatted. if a value v in the vals is a float type it does does ”fmt_values.append(float_format(v))” which attempts to append the float formatted value in the fmt_values This is where the real problem occurs. FloatFormatType = Union[str, Callable, "EngFormatter"] according to the type hint the FloatFormat can in the form of Str, Callable or Engformatter But the float_format passed here is str and it attempts to use it as a Callable which is why the error /format.py", line 1404, in _format_strings fmt_values.append(float_format(v)) TypeError: 'str' object is not callable

The are two potential fixes are

  • Editing the GenericArrayFormatter - FloatArrayFormatter doesn't have this issue so I am thinking of implementing the way the FloatArrayFormatter formats its values with the float_formatter in the GenericArrayFormatter class. FloatArrayFormatter uses this function to do so def format_values_with(float_format):
  • Editing the ExtensionArrayFormatter - I want to convert the np array to float so it uses FloatArrayFormatter instead of GenericArrayFormatter

Then get a return which would be a formatted value in line 1402 in the is_float_type[i] condition I want to check if the float_format is Callable: fmt_values.append(float_format(v)) Else: append to returned value from the function format_values_with(float_format) to the fmt_values

My proposed solution is to Edit the GenericArrayFormatter I just wanted to know if you agree or have suggestions/comments

Comment From: Dr-Irv

2 issues:

  • The docs should be fixed to say that DataFrame.to_string() and Series.to_string() accept a format string
  • The bug of to_string() passing a format string when its an underlying extension array backing it