Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
df1 = pandas.DataFrame({"a": [1]})
df2 = pandas.DataFrame({"a": [2]})
df1.attrs["metadata-xy"] = 42

print(df1.append(df2).attrs)  # keeps the attrs of df1
print(pandas.concat([df1, df2]).attrs)  # no attrs in result

Issue Description

append preserves attrs, but concat doesn't

Originally reported here https://github.com/pandas-dev/pandas/issues/35407#issuecomment-1022107661

Expected Behavior

df1.append(df2).attrs and pandas.concat([df1, df2]).attrs should probably match

Installed Versions

Replace this line with the output of pd.show_versions()

Comment From: rhshadrach

There is some discussion on what the behavior should be in #28283, but perhaps it is worthy of an issue on its own.

Comment From: asishm

related https://github.com/pandas-dev/pandas/issues/41828

Comment From: lithomas1

I think concat only keeps the attrs only when they match by design. Re-labeling as enhancement. https://github.com/pandas-dev/pandas/issues/41828#issuecomment-855383754

Comment From: simonjayhawkins

moving to 1.4.3

Comment From: jonastieppo

Even append doest not work well for that. append is a method that only appends the rows of the dataframe passed as argument, in the dataframe caller.

Hence, if you swap the order you call the append in your example, you can see that attrs are not preserved. You can note in the following code:

import pandas as pd
df1 = pd.DataFrame({"a": [1]})
df2 = pd.DataFrame({"a": [2]})
df1.attrs["metadata-xy"] = 42

print("No attributes preserved: ",df2.append(df1).attrs)

About concat: As I could check, concat binds two objects, but for some reason the attribute attrs is not binded. I think it would be a good implentation.

Comment From: simonjayhawkins

removing milestone. as now it is late in the 1.4.x series of releases any fixes probably not now suitable for backport.

Comment From: simonjayhawkins

contributions and PRs welcome.

Comment From: behrenhoff

There is a related bug to concat & attrs:

import pandas

a = pandas.DataFrame({"a": [1]})
b = pandas.DataFrame({"b": [2]})
a.attrs["x"] = pandas.DataFrame()
b.attrs["x"] = pandas.DataFrame()

print(pandas.concat([a, b]).attrs)

The concat raises an Exception:

...
lib/python3.10/site-packages/pandas/core/generic.py in __nonzero__(self)
   1524     @final
   1525     def __nonzero__(self) -> NoReturn:
-> 1526         raise ValueError(
   1527             f"The truth value of a {type(self).__name__} is ambiguous. "
   1528             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

If removing either the a.attrs or the b.attrs, the concat works but the resulting attrs are empty. If the resulting attrs are empty anyway, I don't understand why concat seems to be messing with the attrs at all. At least the error message is misleading and it took me a while to figure out what was wrong.

The question is of course: what is the expected behavior? I am not sure. Maybe an attrs: Literal["copy_first", "copy_last", "update", "reverse_update"] = "update" keyword argument to concat? I guess most useful would be the "update" which would set the resulting attrs to {**df1.attrs, **df2.attrs, ..., **dfN.attrs} but I can also imagine a situation where you would want {**dfN.attrs, ..., **df2.attrs, **df1.attrs} or just the attrs from the first or last DF.