Code Sample
import pandas as pd
df = pd.DataFrame([1,2,3], columns=['column'], index=[1,1,2])
for idx in df.index.unique():
print(type(df.loc[idx]['column']))
for v in df.loc[idx]['column']:
print(v)
Output:
<class 'pandas.core.series.Series'>
1
2
<class 'numpy.int64'>
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-26-a207307b6a42> in <module>()
5 for idx in df.index.unique():
6 print(type(df.loc[idx]['column']))
----> 7 for v in df.loc[idx]['column']:
8 print(v)
TypeError: 'numpy.int64' object is not iterable
Problem description
The problem with this code is, that data.loc[idx]['column']
can either be a Series
or a float64
instance. This is a problem because this requires an additional check. Also this can lead to bugs very easily.
Expected Output
I would expect this to return always a Series
object.
Comment From: jreback
copy pastable example please.
Comment From: ghost
@jreback I updated my original issue text.
Comment From: TomAugspurger
@silentquasar you have a duplicate in your index, so df.loc[1]
is a DataFrame (so slices of that are Series), whereas df.loc[2]
is a Series (so slices of that are scalars).
The usual advice here is either ensure you don't have duplicates (check with .index.is_unique
) or always slice with a collection, like df.loc[[1]]
, so that you always have the same typed objects.
Comment From: jreback
suppose it might be worth s doc note in indexing section (and loc doc string)