Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas
df1 = pandas.DataFrame({"a": [1]})
df2 = pandas.DataFrame({"a": [2]})
df1.attrs["metadata-xy"] = 42
print(df1.append(df2).attrs) # keeps the attrs of df1
print(pandas.concat([df1, df2]).attrs) # no attrs in result
Issue Description
append
preserves attrs, but concat
doesn't
Originally reported here https://github.com/pandas-dev/pandas/issues/35407#issuecomment-1022107661
Expected Behavior
df1.append(df2).attrs
and pandas.concat([df1, df2]).attrs
should probably match
Installed Versions
Comment From: rhshadrach
There is some discussion on what the behavior should be in #28283, but perhaps it is worthy of an issue on its own.
Comment From: asishm
related https://github.com/pandas-dev/pandas/issues/41828
Comment From: lithomas1
I think concat only keeps the attrs only when they match by design. Re-labeling as enhancement. https://github.com/pandas-dev/pandas/issues/41828#issuecomment-855383754
Comment From: simonjayhawkins
moving to 1.4.3
Comment From: jonastieppo
Even append
doest not work well for that. append
is a method that only appends the rows of the dataframe passed as argument, in the dataframe caller.
Hence, if you swap the order you call the append
in your example, you can see that attrs are not preserved. You can note in the following code:
import pandas as pd
df1 = pd.DataFrame({"a": [1]})
df2 = pd.DataFrame({"a": [2]})
df1.attrs["metadata-xy"] = 42
print("No attributes preserved: ",df2.append(df1).attrs)
About concat
:
As I could check, concat
binds two objects, but for some reason the attribute attrs
is not binded. I think it would be a good implentation.
Comment From: simonjayhawkins
removing milestone. as now it is late in the 1.4.x series of releases any fixes probably not now suitable for backport.
Comment From: simonjayhawkins
contributions and PRs welcome.
Comment From: behrenhoff
There is a related bug to concat
& attrs
:
import pandas
a = pandas.DataFrame({"a": [1]})
b = pandas.DataFrame({"b": [2]})
a.attrs["x"] = pandas.DataFrame()
b.attrs["x"] = pandas.DataFrame()
print(pandas.concat([a, b]).attrs)
The concat raises an Exception:
...
lib/python3.10/site-packages/pandas/core/generic.py in __nonzero__(self)
1524 @final
1525 def __nonzero__(self) -> NoReturn:
-> 1526 raise ValueError(
1527 f"The truth value of a {type(self).__name__} is ambiguous. "
1528 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
If removing either the a.attrs
or the b.attrs
, the concat
works but the resulting attrs
are empty. If the resulting attrs
are empty anyway, I don't understand why concat
seems to be messing with the attrs
at all. At least the error message is misleading and it took me a while to figure out what was wrong.
The question is of course: what is the expected behavior? I am not sure. Maybe an attrs: Literal["copy_first", "copy_last", "update", "reverse_update"] = "update"
keyword argument to concat
? I guess most useful would be the "update" which would set the resulting attrs to {**df1.attrs, **df2.attrs, ..., **dfN.attrs}
but I can also imagine a situation where you would want {**dfN.attrs, ..., **df2.attrs, **df1.attrs}
or just the attrs from the first or last DF.