`df = pd.DataFrame([[1, 'a'],[2, 'b'],[3, 'c'],[4, 'd']], columns=['_id', 'id'])
for row in df.itertuples(): print row.id print row._id `
Expected output: a 1 ......
Actual output *** AttributeError: 'Pandas' object has no attribute '_id'
because the _id column gets renamed to _1 in the row object
Comment From: jreback
you didn't pd.show_versions()
, but with 0.18.0 this looks fine.
In [1]: df = pd.DataFrame([[1, 'a'],[2, 'b'],[3, 'c'],[4, 'd']], columns=['_id', 'id'])
In [2]: df
Out[2]:
_id id
0 1 a
1 2 b
2 3 c
3 4 d
In [3]: for row in df.itertuples():
...: print(row)
...:
Pandas(Index=0, _1=1, id='a')
Pandas(Index=1, _1=2, id='b')
Pandas(Index=2, _1=3, id='c')
Pandas(Index=3, _1=4, id='d')
In [6]: [ row for row in df.itertuples() ][0].id
Out[6]: 'a'
In [7]: [ row for row in df.itertuples() ][0]._1
Out[7]: 1
Comment From: hadjmic
Is it intended behavior for _id to be renamed to _1?
From: Jeff Rebackmailto:notifications@github.com Sent: 05/04/2016 15:07 To: pydata/pandasmailto:pandas@noreply.github.com Cc: hadjmicmailto:michael_j_x@hotmail.com Subject: Re: [pydata/pandas] df.itertuples changes the name of the columns (#12799)
you didn't pd.show_versions()
, but with 0.18.0 this looks fine.
In [1]: df = pd.DataFrame([[1, 'a'],[2, 'b'],[3, 'c'],[4, 'd']], columns=['_id', 'id'])
In [2]: df
Out[2]:
_id id
0 1 a
1 2 b
2 3 c
3 4 d
In [3]: for row in df.itertuples():
...: print(row)
...:
Pandas(Index=0, _1=1, id='a')
Pandas(Index=1, _1=2, id='b')
Pandas(Index=2, _1=3, id='c')
Pandas(Index=3, _1=4, id='d')
In [6]: [ row for row in df.itertuples() ][0].id
Out[6]: 'a'
In [7]: [ row for row in df.itertuples() ][0]._1
Out[7]: 1
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/pydata/pandas/issues/12799#issuecomment-205771582
Comment From: jreback
yes, leading underscores are not allowed in NamedTuples
Signature: df.itertuples(index=True, name='Pandas')
Docstring:
Iterate over DataFrame rows as namedtuples, with index value as first
element of the tuple.
Parameters
----------
index : boolean, default True
If True, return the index as the first element of the tuple.
name : string, default "Pandas"
The name of the returned namedtuples or None to return regular
tuples.
Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
With a large number of columns (>255), regular tuples are returned.
See also
--------
iterrows : Iterate over DataFrame rows as (index, Series) pairs.
iteritems : Iterate over (column name, Series) pairs.
Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
>>> df
col1 col2
a 1 0.1
b 2 0.2
>>> for row in df.itertuples():
... print(row)
...
Pandas(Index='a', col1=1, col2=0.10000000000000001)
Pandas(Index='b', col1=2, col2=0.20000000000000001)
File: ~/pandas/pandas/core/frame.py
Type: instancemethod
Comment From: hadjmic
A didn't know that. Sorry for not checking
Sent from Outlook Mobile
On Sun, Apr 10, 2016 at 11:35 AM -0700, "Jeff Reback" notifications@github.com wrote:
yes, leading underscores are not allowed in NamedTuples
Signature: df.itertuples(index=True, name='Pandas')
Docstring:
Iterate over DataFrame rows as namedtuples, with index value as first
element of the tuple.
Parameters
----------
index : boolean, default True
If True, return the index as the first element of the tuple.
name : string, default "Pandas"
The name of the returned namedtuples or None to return regular
tuples.
Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
With a large number of columns (>255), regular tuples are returned.
See also
--------
iterrows : Iterate over DataFrame rows as (index, Series) pairs.
iteritems : Iterate over (column name, Series) pairs.
Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
>>> df
col1 col2
a 1 0.1
b 2 0.2
>>> for row in df.itertuples():
... print(row)
...
Pandas(Index='a', col1=1, col2=0.10000000000000001)
Pandas(Index='b', col1=2, col2=0.20000000000000001)
File: ~/pandas/pandas/core/frame.py
Type: instancemethod
You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/pydata/pandas/issues/12799#issuecomment-208041723
Comment From: shouldsee
It seems "from" is a keyword that triggers renameing
import pandas as pd
next(pd.DataFrame([[1,2],[3,4]],columns=['from','to']).itertuples())
### Pandas(Index=0, _1=1, to=2)
Comment From: simonjayhawkins
@shouldsee : that's right. from
is a Python keyword. fieldnames of a namedtuple cannot be a Python keyword. https://docs.python.org/3/library/collections.html?highlight=namedtuple#collections.namedtuple.
>>> from collections import namedtuple
>>> namedtuple('Pandas', ['Index','from','to'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\simon\Anaconda3\envs\pandas-dev\lib\collections\__init__.py", line 364, in namedtuple
raise ValueError('Type names and field names cannot be a '
ValueError: Type names and field names cannot be a keyword: 'from'
>>> namedtuple('Pandas', ['Index','from','to'],rename=True)(0,1,2)
Pandas(Index=0, _1=1, to=2)
>>>
perhaps the pandas documentation could be changed from
Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
With a large number of columns (>255), regular tuples are returned.
to
Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, start with an underscore, or
are a Python keyword.
With a large number of columns (>255), regular tuples are returned.
Comment From: shouldsee
Dear @simonjayhawkins
Many thanks for the note! I did not realise that Pandas is using collections.namedtuple() for df.itertuples() -- I have used found itertuples() very useful in creating iterator of dictionaries with
it = (x.__dict__ for x in df.itertuples())
Can we possibly add a itertuples(as_dict=True) subroutine? (Also needs to rename x.__dict__['Index']
to x.__dict__['index']
to agree with df.index
)
Kind regards Feng
Comment From: simonjayhawkins
i'm assuming you have a large DataFrame and that .to_dict(orient='records')
doesn't serve your purpose. If so can you open a new issue for this.
Comment From: simonjayhawkins
you'll probably get a better solution if you ask on SO, but for single-level indexes something like...
>>> import pandas as pd
>>> df = pd.DataFrame([[1,2],[3,4]],columns=['from','to'])
>>> keys = [df.index.name or 'index'] + list(df.columns)
>>> it = (dict(zip(keys,row)) for row in df.itertuples(name=None))
>>> list(it)
[{'index': 0, 'from': 1, 'to': 2}, {'index': 1, 'from': 3, 'to': 4}]
Comment From: shouldsee
It was mainly for syntax reason that I wanted iterdicts(), so not particularly fussed with its performance yet. See #25973