Problem description

This is a feature request for usecols to accept integer value and string. If integer/string is provided, flag squeeze is to be turned on, thus producing a Series.

Rationale

  • consistency with index_col which accepts both integer/string and lists
  • shortcut

Expected Output

# test
testfile = "pandas/tests/io/data/tips.csv"
df = pd.read_table(testfile, sep=",", usecols="tip")
assert isinstance(df, pd.Series)

df = pd.read_table(testfile, sep=",", usecols=2)
assert isinstance(df, pd.Series)

df = pd.read_table(testfile, sep=",", usecols=[2])
assert isinstance(df, pd.DataFrame)

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.3.2.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-642.13.1.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.0 nose: 1.3.0 pip: 8.1.2 setuptools: 20.9.0 Cython: 0.25.1 numpy: 1.11.2 scipy: 0.18.0 statsmodels: 0.6.1 xarray: None IPython: 5.0.0.dev sphinx: None patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: None tables: 3.3.1-dev0 numexpr: 2.6.1 matplotlib: 1.5.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.4.1 html5lib: None httplib2: 0.9.2 apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None

Comment From: jreback

cc @gfyoung thoughts?

Comment From: gfyoung

I am open to accepting scalars for usecols because that is indeed consistent with index_col.

However, I am not a fan of squeezing to Series automatically if a scalar is passed in. We shouldn't be coupling together squeeze and usecols behavior. The flags should be as independent of one another as possible. While I understand the rationale, from a design standpoint, I would be against this.

Comment From: jorisvandenbossche

I agree with @gfyoung on the squeeze. I also understand this would be nice, but it is not that it makes new behaviour possible, it just for saving a few keystrokes (squeeze=True). In that light, I don't find adding more complexity to the usecols keyword worth this (and squeeze=True is also more explicit).

Regarding accepting a scalar, I have less an opinion about that. Probably not much harm, but I also don't really see the need (the current docs are very clear: either array-like or callable, which is more clear than scalar, array-like or callable)

Comment From: jreback

I will agree about automagically returning a Series. To be very honest, I find the squeeze not really necessary either, and just additional code bloat. Reading a csv should always return a DataFrame, but that's a different discussion.

As far as usecols accepting a scalar; a bit indifferent about that as well. Sure if if someone wants to do it, I think we would accept it.