Code Sample, a copy-pastable example if possible
d=[['hello',1,'GOOD','long.kw'],
[1.2,'chipotle',np.nan,'bingo'],
['various',np.nan,3000,123.456]]
t=pd.DataFrame(data=d, columns=['A','B','C','D'])
t['combined'] = t.apply(lambda x: list([x['A'], x['B'], x['C'], x['D']]),axis=1)
Problem description
[I am confuse why this is not working properly, if I initiate the 'combined' columns first to 0 first, it works. I understand that this is a sub-optimal approach but I am just wondering why is this breaking up]
Expected Output
t['combined'] = t.values.tolist()
t
Out[80]:
A B C D combined
0 hello 1 GOOD long.kw [hello, 1, GOOD, long.kw]
1 1.20 chipotle NaN bingo [1.2, chipotle, nan, bingo]
2 various NaN 3000 123.46 [various, nan, 3000, 123.456]
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None
Comment From: TomAugspurger
See https://github.com/pandas-dev/pandas/issues/15628 and issues linking to / from that.
The short-version is DataFrame.apply
tries to infer an output based on the result. The result of your output is inferred to be a DataFrame with the same columns.
You're probably better off with something like
In [51]: pd.Series([list(x) for x in t.itertuples(index=False)])
Out[51]:
0 [hello, 1, GOOD, long.kw]
1 [1.2, chipotle, nan, bingo]
2 [various, nan, 3000, 123.456]
dtype: object