Hi, i am use 0.16.2. I got the "IndexError: index out of bounds", the script is:
import pandas as pd
def test():
names = ['openid', 'clientid', 'roleid', 'rolename', 'school', 'IP', 'snid', 'rolelogin_time', 'amount', 'val', 'consume_sum', 'own_after', 'first_payment_time', 'last_payment_time', 'last_login_time', 'first_pay_level', 'level', 'vip_level']
df = pd.read_csv("2016-04-29.csv", names=names)
df_role = df.set_index(["openid", "clientid"], drop=False)
clientid_role = df_role.groupby("clientid").openid.nunique()
test()
here is the output:
Traceback (most recent call last):
File "test.py", line 11, in <module>
test()
File "test.py", line 7, in test
clientid_role = df_role.groupby("clientid").openid.nunique()
File "<string>", line 18, in nunique
File "/root/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 514, in __getattr__
if attr in self.obj:
File "/root/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 704, in __contains__
return key in self._info_axis
File "/root/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 4521, in __contains__
self.get_loc(key)
File "/root/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 5055, in get_loc
loc = self._get_level_indexer(key, level=0)
File "/root/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 5291, in _get_level_indexer
loc = level_index.get_loc(key)
File "/root/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
File "pandas/index.pyx", line 143, in pandas.index.IndexEngine.get_loc (pandas/index.c:3578)
File "pandas/src/util.pxd", line 41, in util.get_value_at (pandas/index.c:15287)
IndexError: index out of bounds
2016-04-29.csv is almost 760M, the code and csv can clone from https://github.com/Skycrab/pandas_test
see also https://github.com/pydata/pandas/issues/5727
output of pd.show_versions()
`In [4]: pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.10.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-431.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.16.2 nose: 1.3.7 Cython: 0.22.1 numpy: 1.9.2 scipy: 0.15.1 statsmodels: 0.6.1 IPython: 3.2.0 sphinx: 1.3.1 patsy: 0.3.0 dateutil: 2.4.2 pytz: 2015.4 bottleneck: 1.0.0 tables: 3.2.0 numexpr: 2.4.3 matplotlib: 1.4.3 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: 1.0.0 xlsxwriter: 0.7.3 lxml: 3.4.4 bs4: 4.3.2 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.5 pymysql: None psycopg2: None`
Comment From: TomAugspurger
Can you try on the latest version and see if it's fixed?
Comment From: mdotwills
Confirmed not an issue, tested on pandas 0.18.0 and python 2.7.11.