Code Sample, a copy-pastable example if possible
from StringIO import StringIO
import pandas as pd
from random import random
s = "a,b,c"
df = pd.read_csv(StringIO(s))
f1 = lambda row: random()
new_col1 = df.apply(f1, axis=1)
df["d"] = new_col1
f2 = lambda row: 0 if row["a"] > 10 else 1
new_col2 = df.apply(f2, axis=1)
df["e"] = new_col2
df
Expected Output
Empty DataFrame
Columns: [a, b, c, d, e]
Index: []
The problem seems to be that a Series
row
is passed to the lambda and that 'a' can't be used to index into it. This is only breaks for an empty DataFrame
. If it has values the output is as expected.
According to the docs:
Objects passed to functions are Series objects having index either the DataFrame’s index (axis=0) or the columns (axis=1)
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 20.7.0
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
Comment From: jreback
you need to pass reduce=True
Comment From: jreback
In [14]: new_col2 = df.apply(f2, axis=1, reduce=True)
In [15]: new_col2
Out[15]: Series([], dtype: float64)