A small, complete example of the issue
import pandas as pd
df1 = pd.DataFrame.from_items([('A', [1,2,3]), ('B', [4, 5, 6])])
df2 = pd.DataFrame.from_items([('A', [pd.datetime(1970,1,1), pd.datetime(1970,1,1), pd.datetime(1970,1,1)]), ('B', [4, 5, 6])])
f = lambda row: [row.B + 3]
r1 = df1.apply(f, axis=1)
r2 = df2.apply(f, axis=1)
# bug: r1 and r2 are different and different in shape.
# expect: r1 and r2 to be the same values and shape.
r1
r2
# a clue to the difference:
(df1._is_mixed_type,df1._is_datelike_mixed_type)
(df2._is_mixed_type,df2._is_datelike_mixed_type)
Expected Output
0 [7]
1 [8]
2 [9]
Output of pd.show_versions()
Comment From: jreback
this is irrespective of other dtypes. You are returning a 2 element list so naturally pandas will try to coerce to the original shape as its compatible. Returning a list is not labeled so that is the only thing pandas can do. You can do this if you want. I also suppose more/better documentation is possible.
In [8]: df1.apply(lambda row: Series([row.B + 3], ['result']), axis=1)
Out[8]:
result
0 7
1 8
2 9
Comment From: KevinGrealish
Jeff, are you saying it's by design that r1 and r2 come out different, (By Design), or are you saying they should be the same (Won't Fix)?
Comment From: KevinGrealish
In my case, f is a function that takes a string and returns a list of strings and it needs to be applied to each row. i.e. I want result cells to have lists in them, I simplified this in the example, but here is a better repro. If what I was doing is under specified to Pandas, I want it to error all the time, not just when another column happens to be a date. It cost a me a lot of time trying to figure out why this only worked some of the time.:
import pandas as pd
df1 = pd.DataFrame.from_items([('A', [1,2,3]), ('B', ["ABCD", "EFGH", "IJKL"])])
df2 = pd.DataFrame.from_items([('A', [pd.datetime(1970,1,1), pd.datetime(1970,1,1), pd.datetime(1970,1,1)]), ('B', ["ABCD", "EFGH", "IJKL"])])
f = lambda row: [char for char in row.B]
r1 = df1.apply(f, axis=1) # works
r2 = df2.apply(f, axis=1) # fails
# bug: apply when date present crashes.
# expect: r1 and r2 to be the same values and shape. r1 has the expected value.
The problem is that I now have code that crashes only when a date column is present. Your workaround to use a series:
f = lambda row: pd.Series([[char for char in row.B]], ["result"])
r1 = df1.apply(f, axis=1)
r2 = df2.apply(f, axis=1)