Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.Index([f'{i}_{i}' for i in range(10)]).str.split('__', expand=True)
Issue Description
Returns an index Index(['0_0', '1_1', '2_2', '3_3', '4_4', '5_5', '6_6', '7_7', '8_8', '9_9'], dtype='object')
Expected Behavior
Should return a single-level multi-index per the docs with expand=True
.
Installed Versions
Comment From: MarcoGorelli
thanks for the report - to expedite resolution, could you put a descriptive title please?
Comment From: erezinman
LOL, sorry. Right away.
Comment From: MarcoGorelli
this is correct, the string doesn't contain __
if you'd used _
you'd have got a multiindex
In [7]: import pandas as pd
...: pd.Index([f'{i}_{i}' for i in range(10)]).str.split('_', expand=True)
Out[7]:
MultiIndex([('0', '0'),
('1', '1'),
('2', '2'),
('3', '3'),
('4', '4'),
('5', '5'),
('6', '6'),
('7', '7'),
('8', '8'),
('9', '9')],
)
closing for now then, but thanks for the report
Comment From: MarcoGorelli
Should return a single-level multi-index
sorry, you're right, in the Series case it does indeed return a single-column DataFrame:
In [11]: import pandas as pd
...: pd.Series([f'{i}_{i}' for i in range(10)]).str.split('__', expand=True)
Out[11]:
0
0 0_0
1 1_1
2 2_2
3 3_3
4 4_4
5 5_5
6 6_6
7 7_7
8 8_8
9 9_9
Comment From: erezinman
Why of course it doesn't contain that, but the returned type should not be affected by that fact
Comment From: ramvikrams
I would like to take up this issue, but just have to ask you one thing is the change to be made in pandas/core/strings/accessor.py
if I am correct.