Code Sample, a copy-pastable example if possible

As expected:

import pandas as pd
all_cols = list("abcdefg")

informative_cols = ["a","c","g"]

dfrandom = pd.DataFrame(pd.np.random.rand(5, len(all_cols)), columns=all_cols)

field = informative_cols[0]
print(field)
print(type(dfrandom[field]))
print(type(dfrandom[informative_cols][field]))

returns:

a
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

However, unexpectedly:

import pandas as pd
all_cols = list("abcdefg")

informative_cols = ["a","c","g"]

dfrandom = pd.DataFrame(pd.np.random.rand(5, len(all_cols)), columns=all_cols)

field = informative_cols[0]
print(field)
print(type(dfrandom[field]))
print(type(dfrandom[informative_cols][field]))

returns:

0_AnatomicRegionSequence_CodeMeaning
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>

Problem description

Subsetting one single column with a string must output the same result independent of copying and previous subsetting. Because expected output is consistent and unexpected is inconsistent

Expected Output

0_AnatomicRegionSequence_CodeMeaning
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.2.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.2 pytest: None pip: 9.0.1 setuptools: 36.0.1 Cython: None numpy: 1.13.0 scipy: 0.19.0 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: TomAugspurger

Can you check your second example? I see

a
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

for both.

Comment From: jorisvandenbossche

@DSLituiev I think you mistakingly pasted twice exactly the same code for both the working and failing example. Can you update this?

Comment From: DSLituiev

I beg pardon.

informative_cols = [all_cols[x] for x in [0,2,4]]
informative_cols = informative_cols + informative_cols

dfrandom = pd.DataFrame(pd.np.random.rand(5, len(all_cols)), columns=all_cols)

field = informative_cols[0]
print(field)
print(type(dfrandom[field]))
print(type(dfrandom[informative_cols][field]))

The issue seems due to duplicated columns in selection

Comment From: DSLituiev

Maybe it is good to return a warning in this case?

Comment From: jorisvandenbossche

Ah, yes, that is expected. If you have duplicate column names, df[col_name] will select all columns with this name, and thus return a DataFrame instead of a Series

Comment From: TomAugspurger

I don't think we'll issue a warning here. For better or worse, duplicates are allowed and used by some.

I'd recommend using something like engarde if you want runtime validation that you don't have duplicates.