Pandas DataFrame apply get same result for each columns

Code Sample:

df = DataFrame({'a': range(10), 'b': range(20, 30), 'c': range(30, 40)})
df.apply(lambda x: [x])

out[]:
a    [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
b    [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
c    [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

Problem description

Apply get same result for each columns, I think it should not!

Expected Output

a [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]] b [[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]] c [[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]]

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-79-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: zh_CN.UTF-8 LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 35.0.2 Cython: None numpy: 1.12.1 scipy: 0.18.0 statsmodels: 0.6.1 xarray: None IPython: 5.3.0 sphinx: 1.4.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: 3.2.2 numexpr: 2.4.3 matplotlib: 1.5.2 openpyxl: 2.4.0 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.6 lxml: 3.6.4 bs4: 4.5.1 html5lib: 0.999999999 httplib2: None apiclient: None sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 boto: None pandas_datareader: None

Comment From: jorisvandenbossche

There is certainly something strange going on, but what are you exactly trying to obtain? Because now you are trying to put Series objects inside a list inside a dataframe. But does not seem that useful.

Comment From: xmduhan

I have a datafrme present some parameters:

       p1   p2  p3
a   0   0   0
b   1   1   1
c   2   2   2
... ...

each row is a group of a parameters. I will do some calculation with parameters, then I will get some results, a value and series. I need to save them on datafrme:

       p1   p2  p3     value   series
a   0   0   0        0         [...]
b   1   1   1        1         [...]
c   2   2   2        2         [...]

It's fine to do it row by row. But the calculation is time consuming, and can it be optimizated with whole dataframe calculating at once. And then batch result will be a series present each row's value, and result dataframe with each row's series. Result dataframe's columns is parameter dataframe's index:

        a   b   c
0   0   20  30
1   1   21  31
2   2   22  32
3   3   23  33
4   4   24  34
5   5   25  35

I want to put them back. I use: resultSeries = resultDataframe.apply(lambda x: [x]) to change it to a series, then use: parameterDataframe['series'] = resultSeries to put them back.

But the "resultDataframe.apply(lambda x: [x])" does not get what I need.

Comment From: BranYang

df.apply(lambda x: [list(x.values)]) will do what you want @xmduhan

In [4]: df.apply(lambda x: [list(x.values)])
Out[4]: 
a              [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
b    [[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]
c    [[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]]
dtype: object

Comment From: jreback

as @BranYang example shows. You are just returning a reference to the variable. Using lists as cell should simply be avoided as its completely non-performant (and you are using something mutable anyhow).

Pandas DataFrame apply get same result for each columns

Code Sample:

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`