Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
pytest pandas/tests/generic/test_finalize.py # works fine
pytest pandas/tests/generic/test_finalize.py -n 4 # fails
Issue Description
In the second case, I get:
======================================================= FAILURES =======================================================_____________________________________________ test_binops[pow-args7-right] _____________________________________________[gw1] linux -- Python 3.8.15 /home/marcogorelli/mambaforge/envs/pandas-dev/bin/python3.8
request = <FixtureRequest for <Function test_binops[pow-args7-right]>>, args = ( A
0 1, 0 1
dtype: int64)
annotate = 'right', all_binary_operators = <built-in function pow>
@pytest.mark.filterwarnings(
"ignore:Automatic reindexing on DataFrame vs Series:FutureWarning"
)
@pytest.mark.parametrize("annotate", ["left", "right", "both"])
@pytest.mark.parametrize(
"args",
[
(1, pd.Series([1])),
(1, pd.DataFrame({"A": [1]})),
(pd.Series([1]), 1),
(pd.DataFrame({"A": [1]}), 1),
(pd.Series([1]), pd.Series([1])),
(pd.DataFrame({"A": [1]}), pd.DataFrame({"A": [1]})),
(pd.Series([1]), pd.DataFrame({"A": [1]})),
(pd.DataFrame({"A": [1]}), pd.Series([1])),
],
)
def test_binops(request, args, annotate, all_binary_operators):
# This generates 624 tests... Is that needed?
left, right = args
if annotate == "both" and isinstance(left, int) or isinstance(right, int):
return
if annotate in {"left", "both"} and not isinstance(left, int):
left.attrs = {"a": 1}
if annotate in {"left", "both"} and not isinstance(right, int):
right.attrs = {"a": 1}
result = all_binary_operators(left, right)
> assert result.attrs == {"a": 1}
E AssertionError: assert {} == {'a': 1}
E Right contains 1 more item:
E {'a': 1}
E Use -v to get more diff
pandas/tests/generic/test_finalize.py:508: AssertionError
Has anyone else come across this?
Expected Behavior
Tests should pass whether run in parallel or not
Has anyone else come across this / can reproduce?
Installed Versions
Comment From: MarcoGorelli
Actually, just
pytest pandas/tests/generic/test_finalize.py::test_binops[ne-args1-right]
fails every time
Comment From: MarcoGorelli
This line looks a bit odd to me:
https://github.com/pandas-dev/pandas/blob/dbb2adc1f353d9b0835901c274cbe0d2f5a5664f/pandas/tests/generic/test_finalize.py#L504-L505
if annotate
is 'left'
, then why would right.attrs
be set?
I'll take a closer look later
Looks like
https://github.com/pandas-dev/pandas/blob/20172d4ad2dc841e96c251ef14ff12f18dff6c7a/pandas/core/ops/init.py#L311-L314
might be the issue - prior to that line, right.attrs
is set, but after it, it's unset
Looks like a real issue:
$ cat f.py
import pandas as pd
from pandas import *
left = Series([1])
right = DataFrame({'a': [1]})
left.attrs = {'a': 1}
result = left + right
print(result.attrs)
$ python f.py
{}
Comment From: phofl
same on my machine.
Comment From: mroeschke
There might be vague connection to https://github.com/pandas-dev/pandas/issues/34373
Comment From: TomAugspurger
Just to confirm, the original test failures are fixed, and the issue is now described in https://github.com/pandas-dev/pandas/issues/49916#issuecomment-1328057177, about propagating attrs in binary operations?
I'd recommend we follow xarray here (docs). By default, attrs aren't propagated through binary operations:
In [1]: import xarray as xr
In [2]: a = xr.DataArray([1, 2], name='a', attrs={"foo": "bar"})
In [3]: b = xr.DataArray([1, 2], name='a')
In [4]: (a + a).attrs
Out[4]: {}
In [5]: (a + b).attrs
Out[5]: {}
But a flag
In [8]: xr.set_options(keep_attrs=True)
Out[8]: <xarray.core.options.set_options at 0x10fc70ad0>
In [9]: (a + b).attrs
Out[9]: {'foo': 'bar'}
In [10]: (a + b).attrs
Out[10]: {'foo': 'bar'}
And I believe there are other values to control how merging / conflicts are handled
In [11]: b = xr.DataArray([1, 2], name='a', attrs={"foo": "baz", "other": "other"})
In [12]: (a + b).attrs
Out[12]: {'foo': 'bar'}
An implication of this choice is that we do not propagate attrs through most operations unless explicitly flagged (some methods have a keep_attrs option, and there is a global flag, accessible with xarray.set_options(), for setting this to be always True or False). Similarly, xarray does not check for conflicts between attrs when combining arrays and datasets, unless explicitly requested with the option compat='identical'. The guiding principle is that metadata should not be allowed to get in the way.
Comment From: MarcoGorelli
the original test failures are fixed
that's right! several runs of pytest pandas/tests/generic/test_finalize.py -n 4
all pass for me locally
I'd recommend we follow xarray here
Do you mean to follow them in the sense of propagating through binary operations, or adding keep_attrs
?