Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_string.html
Documentation problem
DataFrame.to_string
docstring says float_format
should be "one-parameter function, optional, default None", but passing a format string (e.g. '%.3e'
) works too, like in .to_csv
, for NumPy float type columns/series.
However, unlike .to_csv
, .to_string
with float_format
string fails with TypeError: 'str' object is not callable
for float extension type columns/series.
s = pd.Series([1.1, 2.2, 3.3])
# works
s.to_csv(float_format="%.3e")
s.convert_dtypes().to_csv(float_format="%.3e")
s.to_string(float_format="%.3e")
# fails
s.convert_dtypes().to_string(float_format="%.3e")
xref: #9448 https://github.com/pandas-dev/pandas-stubs/issues/730
Suggested fix for documentation
It would be nice for the documentation to reflect that a format string can be used, but probably the issue with extension types should be fixed first.
Note also that FloatFormatType
, used in the internal typing of DataFrame.to_string
,
https://github.com/pandas-dev/pandas/blob/abcd440e11d061171b0a4eeb058748629a288568/pandas/core/frame.py#L1149
does include str
as an option.
https://github.com/pandas-dev/pandas/blob/abcd440e11d061171b0a4eeb058748629a288568/pandas/_typing.py#L303
Comment From: topper-123
This does accept formatting string, so the docs could need some updating here and the type hint for Series.to_string
float format should be the same as in DataFrame.to_string
.
This failure you're showing looks like a bug, e.g. if we do s.convert_dtypes().to_string(float_format="xxx")
we get the same error, while it should give ValueError: Invalid format specifier
or similar.
A PR for both doc update and bug are welcome.
Comment From: rsm-23
@topper-123 for the bug, are we trying to add this functionality for Series.to_string
as well?
Comment From: topper-123
Yes, Series.to_string
should accept the same input for float_format
as DataFrame.to_string
. It may already do that, in which case it's just a question of updating the type hint in Series.to_string
.
Comment From: otitoU
take
Comment From: rsm-23
hey @otitoU how is it going? Would you mind if I take a look as well?
Comment From: otitoU
Hey @rsm-23 . it is going good! Sorry, i have already spent some time on it, I hope you don't mind I would like to finish this.
Comment From: otitoU
Hey @phofl just to give you a bit of context.
what is the cause of the bug
So the Error stack shows the problem arises at file pandas/core/series.py line 1765 when the
“result = formatter.to_string()” is called. formatter is an instance of the SeriesFormatter Class from the fmt/format.py internal module.
There are different types of formatter used to format the specific array depending on the values.dtype of the values array like FloatArrayFormatterspecific , GenericArrayFormatter etc all can be found in the format.py file/fmt module. The specific formatter that the formatter.to_string() of the series.py file uses is the GenericArrayFormatter
In the GenericArrayFormatter it iterates over vals which is an array of the values to be formatted. if a value v in the vals is a float type it does does ”fmt_values.append(float_format(v))” which attempts to append the float formatted value in the fmt_values
This is where the real problem occurs. FloatFormatType = Union[str, Callable, "EngFormatter"]
according to the type hint the FloatFormat can in the form of Str, Callable or Engformatter
But the float_format passed here is str and it attempts to use it as a Callable which is why the error
/format.py", line 1404, in _format_strings
fmt_values.append(float_format(v))
TypeError: 'str' object is not callable
The are two potential fixes are
- Editing the GenericArrayFormatter - FloatArrayFormatter doesn't have this issue so I am thinking of implementing the way the FloatArrayFormatter formats its values with the float_formatter in the GenericArrayFormatter class.
FloatArrayFormatter uses this function to do so
def format_values_with(float_format):
- Editing the ExtensionArrayFormatter - I want to convert the np array to float so it uses FloatArrayFormatter instead of GenericArrayFormatter
Then get a return which would be a formatted value in line 1402 in the is_float_type[i] condition I want to check if the float_format is Callable: fmt_values.append(float_format(v)) Else: append to returned value from the function format_values_with(float_format) to the fmt_values
My proposed solution is to Edit the GenericArrayFormatter I just wanted to know if you agree or have suggestions/comments
Comment From: Dr-Irv
2 issues:
- The docs should be fixed to say that
DataFrame.to_string()
andSeries.to_string()
accept a format string - The bug of
to_string()
passing a format string when its an underlying extension array backing it