Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.Index([f'{i}_{i}' for i in range(10)]).str.split('__', expand=True)

Issue Description

Returns an index Index(['0_0', '1_1', '2_2', '3_3', '4_4', '5_5', '6_6', '7_7', '8_8', '9_9'], dtype='object')

Expected Behavior

Should return a single-level multi-index per the docs with expand=True.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 8dab54d6573f7186ff0c3b6364d5e4dd635ff3e7 python : 3.9.12.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-132-generic Version : #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.5.2 numpy : 1.22.4 pytz : 2022.1 dateutil : 2.8.2 setuptools : 61.2.0 pip : 22.3.1 Cython : 0.29.30 pytest : 7.1.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.5.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : fastparquet : None fsspec : 2022.5.0 gcsfs : None matplotlib : 3.5.2 numba : 0.56.0 numexpr : 2.8.3 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 6.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.7.3 snappy : None sqlalchemy : 1.4.27 tables : 3.7.0 tabulate : 0.8.10 xarray : 2022.10.0 xlrd : 2.0.0 xlwt : None zstandard : None tzdata : None

Comment From: MarcoGorelli

thanks for the report - to expedite resolution, could you put a descriptive title please?

Comment From: erezinman

LOL, sorry. Right away.

Comment From: MarcoGorelli

this is correct, the string doesn't contain __

if you'd used _ you'd have got a multiindex

In [7]: import pandas as pd
   ...: pd.Index([f'{i}_{i}' for i in range(10)]).str.split('_', expand=True)
Out[7]:
MultiIndex([('0', '0'),
            ('1', '1'),
            ('2', '2'),
            ('3', '3'),
            ('4', '4'),
            ('5', '5'),
            ('6', '6'),
            ('7', '7'),
            ('8', '8'),
            ('9', '9')],
           )

closing for now then, but thanks for the report

Comment From: MarcoGorelli

Should return a single-level multi-index

sorry, you're right, in the Series case it does indeed return a single-column DataFrame:

In [11]: import pandas as pd
    ...: pd.Series([f'{i}_{i}' for i in range(10)]).str.split('__', expand=True)
Out[11]:
     0
0  0_0
1  1_1
2  2_2
3  3_3
4  4_4
5  5_5
6  6_6
7  7_7
8  8_8
9  9_9

Comment From: erezinman

Why of course it doesn't contain that, but the returned type should not be affected by that fact

Comment From: ramvikrams

I would like to take up this issue, but just have to ask you one thing is the change to be made in pandas/core/strings/accessor.py if I am correct.