Pandas df.itertuples changes the name of the columns

`df = pd.DataFrame([[1, 'a'],[2, 'b'],[3, 'c'],[4, 'd']], columns=['_id', 'id'])

for row in df.itertuples(): print row.id print row._id `

Expected output: a 1 ......

Actual output *** AttributeError: 'Pandas' object has no attribute '_id'

because the _id column gets renamed to _1 in the row object

Comment From: jreback

you didn't pd.show_versions(), but with 0.18.0 this looks fine.

In [1]: df = pd.DataFrame([[1, 'a'],[2, 'b'],[3, 'c'],[4, 'd']], columns=['_id', 'id'])

In [2]: df
Out[2]: 
   _id id
0    1  a
1    2  b
2    3  c
3    4  d

In [3]: for row in df.itertuples():
   ...:     print(row)
   ...:     
Pandas(Index=0, _1=1, id='a')
Pandas(Index=1, _1=2, id='b')
Pandas(Index=2, _1=3, id='c')
Pandas(Index=3, _1=4, id='d')

In [6]: [ row for row in df.itertuples() ][0].id
Out[6]: 'a'

In [7]: [ row for row in df.itertuples() ][0]._1
Out[7]: 1

Comment From: hadjmic

Is it intended behavior for _id to be renamed to _1?

From: Jeff Rebackmailto:notifications@github.com Sent: ‎05/‎04/‎2016 15:07 To: pydata/pandasmailto:pandas@noreply.github.com Cc: hadjmicmailto:michael_j_x@hotmail.com Subject: Re: [pydata/pandas] df.itertuples changes the name of the columns (#12799)

you didn't pd.show_versions(), but with 0.18.0 this looks fine.

In [1]: df = pd.DataFrame([[1, 'a'],[2, 'b'],[3, 'c'],[4, 'd']], columns=['_id', 'id'])

In [2]: df
Out[2]:
   _id id
0    1  a
1    2  b
2    3  c
3    4  d

In [3]: for row in df.itertuples():
   ...:     print(row)
   ...:
Pandas(Index=0, _1=1, id='a')
Pandas(Index=1, _1=2, id='b')
Pandas(Index=2, _1=3, id='c')
Pandas(Index=3, _1=4, id='d')

In [6]: [ row for row in df.itertuples() ][0].id
Out[6]: 'a'

In [7]: [ row for row in df.itertuples() ][0]._1
Out[7]: 1

You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/pydata/pandas/issues/12799#issuecomment-205771582

Comment From: jreback

yes, leading underscores are not allowed in NamedTuples

Signature: df.itertuples(index=True, name='Pandas')
Docstring:
Iterate over DataFrame rows as namedtuples, with index value as first
element of the tuple.

Parameters
----------
index : boolean, default True
    If True, return the index as the first element of the tuple.
name : string, default "Pandas"
    The name of the returned namedtuples or None to return regular
    tuples.

Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
With a large number of columns (>255), regular tuples are returned.

See also
--------
iterrows : Iterate over DataFrame rows as (index, Series) pairs.
iteritems : Iterate over (column name, Series) pairs.

Examples
--------

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
                      index=['a', 'b'])
>>> df
   col1  col2
a     1   0.1
b     2   0.2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='a', col1=1, col2=0.10000000000000001)
Pandas(Index='b', col1=2, col2=0.20000000000000001)
File:      ~/pandas/pandas/core/frame.py
Type:      instancemethod

Comment From: hadjmic

A didn't know that. Sorry for not checking

Sent from Outlook Mobile

On Sun, Apr 10, 2016 at 11:35 AM -0700, "Jeff Reback" notifications@github.com wrote:

yes, leading underscores are not allowed in NamedTuples

Signature: df.itertuples(index=True, name='Pandas')
Docstring:
Iterate over DataFrame rows as namedtuples, with index value as first
element of the tuple.

Parameters
----------
index : boolean, default True
    If True, return the index as the first element of the tuple.
name : string, default "Pandas"
    The name of the returned namedtuples or None to return regular
    tuples.

Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
With a large number of columns (>255), regular tuples are returned.

See also
--------
iterrows : Iterate over DataFrame rows as (index, Series) pairs.
iteritems : Iterate over (column name, Series) pairs.

Examples
--------

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
                      index=['a', 'b'])
>>> df
   col1  col2
a     1   0.1
b     2   0.2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='a', col1=1, col2=0.10000000000000001)
Pandas(Index='b', col1=2, col2=0.20000000000000001)
File:      ~/pandas/pandas/core/frame.py
Type:      instancemethod

You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/pydata/pandas/issues/12799#issuecomment-208041723

Comment From: shouldsee

It seems "from" is a keyword that triggers renameing

import pandas as pd
next(pd.DataFrame([[1,2],[3,4]],columns=['from','to']).itertuples())
### Pandas(Index=0, _1=1, to=2)

Comment From: simonjayhawkins

@shouldsee : that's right. from is a Python keyword. fieldnames of a namedtuple cannot be a Python keyword. https://docs.python.org/3/library/collections.html?highlight=namedtuple#collections.namedtuple.

>>> from collections import namedtuple
>>> namedtuple('Pandas', ['Index','from','to'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\Anaconda3\envs\pandas-dev\lib\collections\__init__.py", line 364, in namedtuple
    raise ValueError('Type names and field names cannot be a '
ValueError: Type names and field names cannot be a keyword: 'from'
>>> namedtuple('Pandas', ['Index','from','to'],rename=True)(0,1,2)
Pandas(Index=0, _1=1, to=2)
>>>

perhaps the pandas documentation could be changed from

Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
With a large number of columns (>255), regular tuples are returned.

Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, start with an underscore, or
are a Python keyword.
With a large number of columns (>255), regular tuples are returned.

Comment From: shouldsee

Dear @simonjayhawkins

Many thanks for the note! I did not realise that Pandas is using collections.namedtuple() for df.itertuples() -- I have used found itertuples() very useful in creating iterator of dictionaries with


it = (x.__dict__ for x in df.itertuples())

Can we possibly add a itertuples(as_dict=True) subroutine? (Also needs to rename x.__dict__['Index'] to x.__dict__['index'] to agree with df.index)

Kind regards Feng

Comment From: simonjayhawkins

i'm assuming you have a large DataFrame and that .to_dict(orient='records') doesn't serve your purpose. If so can you open a new issue for this.

Comment From: simonjayhawkins

you'll probably get a better solution if you ask on SO, but for single-level indexes something like...

>>> import pandas as pd
>>> df = pd.DataFrame([[1,2],[3,4]],columns=['from','to'])
>>> keys = [df.index.name or 'index'] + list(df.columns)
>>> it = (dict(zip(keys,row)) for row in df.itertuples(name=None))
>>> list(it)
[{'index': 0, 'from': 1, 'to': 2}, {'index': 1, 'from': 3, 'to': 4}]

Comment From: shouldsee

It was mainly for syntax reason that I wanted iterdicts(), so not particularly fussed with its performance yet. See #25973