Pandas DataFrame.apply() returns DataFrame unexpectedly when length of list stored in cell matches DF-dimensions

Code Sample, a copy-pastable example if possible

import pandas as pd

# list length does NOT match DataFrame dimensions, works as expected:
df = pd.DataFrame()
df['A'] = [[1, 2], [1, 2], [1, 2]]
df['B'] = [[3, 4], [3, 4], [3, 4]]
df['C'] = [[4, 6], [4, 6], [4, 6]]
print(df)

#           A       B       C
#   0  [1, 2]  [3, 4]  [4, 6]
#   1  [1, 2]  [3, 4]  [4, 6]
#   2  [1, 2]  [3, 4]  [4, 6]

D = df.apply(lambda row: [a+b for a, b in zip(row.A, row.B)], axis=1)
print(D)

#   0    [4, 6]
#   1    [4, 6]
#   2    [4, 6]
#   dtype: object

df['D'] = D
print(df)

#           A       B       C       D
#   0  [1, 2]  [3, 4]  [4, 6]  [4, 6]
#   1  [1, 2]  [3, 4]  [4, 6]  [4, 6]
#   2  [1, 2]  [3, 4]  [4, 6]  [4, 6]

# list length DOES match DataFrame dimensions, does not work as expected:

df = pd.DataFrame()
df['A'] = [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
df['B'] = [[3, 4, 5], [3, 4, 5], [3, 4, 5]]
df['C'] = [[4, 6, 8], [4, 6, 8], [4, 6, 8]]
print(df)

#              A          B          C
#   0  [1, 2, 3]  [3, 4, 5]  [4, 6, 8]
#   1  [1, 2, 3]  [3, 4, 5]  [4, 6, 8]
#   2  [1, 2, 3]  [3, 4, 5]  [4, 6, 8]

# unwanted result:
D = df.apply(lambda row: [a+b for a, b in zip(row.A, row.B)], axis=1)
print(D)

#      A  B  C
#   0  4  6  8
#   1  4  6  8
#   2  4  6  8

# correct:
df['D'] = [[a+b for a, b in zip(row.A, row.B)] for row in df.itertuples()]

print(df)

#              A          B          C          D
#   0  [1, 2, 3]  [3, 4, 5]  [4, 6, 8]  [4, 6, 8]
#   1  [1, 2, 3]  [3, 4, 5]  [4, 6, 8]  [4, 6, 8]
#   2  [1, 2, 3]  [3, 4, 5]  [4, 6, 8]  [4, 6, 8]

Problem description

This behavior came up in a bigger project, when the length of a stored list incidentally met the DataFrame's dimension and resulted in an exception raised. I didn't expect apply() to return a DataFrame at all. I tried to find a parameter to change this behavior, but didn't have any luck there.

May be related to https://github.com/pandas-dev/pandas/issues/5299

Expected Output

see above

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.9.11-1-ARCH machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 34.3.2 Cython: None numpy: 1.12.1 scipy: 0.19.0 statsmodels: 0.8.0 xarray: None IPython: 5.3.0 sphinx: 1.5.3 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 2.0.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.9.5 boto: None pandas_datareader: None

Comment From: jreback

xref https://github.com/pandas-dev/pandas/issues/14370 (and some linked issues).

you are on your own if you have lists inside a cell. This is not idiomatic (and certainly not performant in any way). If you really really want to do this, then return tuples.

Pandas DataFrame.apply() returns DataFrame unexpectedly when length of list stored in cell matches DF-dimensions

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`