Is your feature request related to a problem?
Currently, the constructor for Series
has the name
argument as a str
or None
. But Series.name
can return a Hashable
, e.g., as a result of using DataFrame.xs()
or .iloc
on a row of a DataFrame
.
So this is inconsistent. I think people expect the names of Series to be strings.
Describe the solution you'd like
Change the implementation of Series.name
to return a string or None
API breaking implications
This is cleaning up the API, IMHO.
Describe alternatives you've considered
N/A
Additional context
ref: discussion in #42534
Comment From: jbrockmendel
-1 on changing behavior to appease a type checker.
The Series docstring is incorrect.
Comment From: vladu
Current typing is correct. Restricting series name to string would break frames with multi-level columns. Example:
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=pd.MultiIndex.from_tuples([('a', 'one'), ('a', 'two')]))
>>> df
a
one two
0 1 2
1 3 4
>>> df.iloc[:, 0].name
('a', 'one')
Comment From: Dr-Irv
-1 on changing behavior to appease a type checker.
The Series docstring is incorrect.
That means if we change the docstring, you're changing the API as well.
So either we change the Series
docstring (API change) or Series.name
return type.
Comment From: Dr-Irv
Current typing is correct. Restricting series name to string would break frames with multi-level columns. Example:
It's not clear that it would "break". I'm suggesting that you'd see this:
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=pd.MultiIndex.from_tuples([('a', 'one'), ('a', 'two')]))
>>> df
a
one two
0 1 2
1 3 4
>>> df.iloc[:, 0].name
"('a', 'b')"
Comment From: vladu
But imagine a use case where you perform an operation that results in multi-level column names, but then want to "flatten" them with some custom formatting, for reporting / display purposes.
>>> list(f'{t[0]}: {t[1]}' for t in df.columns)
['a: one', 'a: two']
Easy right now, essentially impossible, or at least massively more error-prone, if the series.name tuple is stringified.
Comment From: Dr-Irv
But imagine a use case where you perform an operation that results in multi-level column names, but then want to "flatten" them with some custom formatting, for reporting / display purposes.
Your example doesn't apply. The elements of df.columns
would remain tuples in this proposal. What would change is the name of the series.
Comment From: TomAugspurger
Agreed with changing the docs to match the behavior (which doesn't qualify as an API change; it's just an issue with the docs).