Code Sample, a copy-pastable example if possible
>>> df
0 1 2 3
0 1 21 51 61
1 2 22 52 62
2 3 23 53 63
>>> df[0:0]
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []
>>> df.loc[0:0]
0 1 2 3
0 1 21 51 61
For loc, slicing is incorrect
The behavior exhibited by slicing of loc is incosistent with python array slicing. For [0:0] it should have returned empty, but it is returning a row.
Expected Output
Similar to code show below for python array, pd.DataFrame.loc slicing should produce empty
E.g. Following code slices empty for [0:0]
>>> l = [0,1,2]
>>> l[0:0]
[]
Output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.1.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.19.1 nose: None pip: 9.0.1 setuptools: 25.2.0 Cython: None numpy: 1.10.4 scipy: 0.17.0 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.2 pytz: 2016.3 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.1 openpyxl: 2.2.0-b1 xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None
Comment From: jreback
http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label
label slicing always includes the endpoint
positional (iloc) slicing matches python semantics
Comment From: Ajeet-Ganga
Isn't the example below producing empty slice ? Shouldn't it be consistent with the following experience ?
>>> l = [0,1,2]
>>> l[0:0]
[]
Comment From: jreback
read the docs
Comment From: Ajeet-Ganga
What would we loose by being consistent? Just because we put something in the documentation doesn't make it right.
Performance over consistent behavior is not even a contest.
Comment From: jreback
you are missing the point
.iloc is positional indexing .loc is label indexing
these are two separate and distinct ways of selecting data numpy and python only have positional concepts and thus have only 1 way of indexing
pandas can also index by labels; since these can be for example strings (or datetimes or integers) you now have different semantics
again the docs are very complete on these concepts
and if you chose not to use label indexing, then just use .iloc and it will feel like python/numpy