Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas.testing import assert_frame_equal


df1 = pd.DataFrame({'set_column': [{1, 2, 3}, {4, 5, 6}]})  # last number is different.
df2 = pd.DataFrame({'set_column': [{1, 2, 3}, {4, 5, 7}]})

assert_frame_equal(df1, df2)

Issue Description

Exception:

pandas/_libs/testing.pyx:53: in pandas._libs.testing.assert_almost_equal
    ???
pandas/_libs/testing.pyx:159: in pandas._libs.testing.assert_almost_equal
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: 'set' object is not subscriptable

pandas/_libs/testing.pyx:159: TypeError

It was raised in assert_series_equal in last else statement:

_testing.assert_almost_equal(
            left._values,
            right._values,
            rtol=rtol,
            atol=atol,
            check_dtype=bool(check_dtype),
            obj=str(obj),
            index_values=np.asarray(left.index),
        )

Expected Behavior

assert not equal error will be shown.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805b8cabe977f40c05259cc3fcf7ead5687d python : 3.9.5.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-1028-aws Version : #32~20.04.1-Ubuntu SMP Mon Jan 9 18:02:08 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.3.5 numpy : 1.22.2 pytz : 2022.7.1 dateutil : 2.8.0 pip : 22.3 setuptools : 65.5.0 Cython : 0.29.33 pytest : 6.2.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.0.8 lxml.etree : 4.9.1 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.4 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 3.0.0 pyxlsb : None s3fs : None scipy : 1.5.4 sqlalchemy : 1.4.46 tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

But also was able to reproduce on latest version pandas==1.5.3.

Comment From: phofl

Hi, thanks for your report. Investigations are welcome, but I think you will have a bad time with sets inside of a DataFrame. Nested data support is not great

Comment From: jayendra-patil33

https://github.com/pandas-dev/pandas/blob/a344dc904bb3a3918dde1677e210cd970ef66c3a/pandas/_libs/testing.pyx#L100-L107

if isinstance(a,set) or isinstance(b,set):
    assert a==b
    return True

In testing.pyx if we add the above snippet similar to how dict and str conditions are handled, We do get the expected output:

AssertionError: DataFrame.iloc[:, 0] (column name="set_column") are different

DataFrame.iloc[:, 0] (column name="set_column") values are different (50.0 %)
[index]: [0, 1]
[left]:  [{1, 2, 3}, {4, 5, 6}]
[right]: [{1, 2, 3}, {4, 5, 7}]
At positional index 1, first diff: {4, 5, 6} != {4, 5, 7}

If this looks good, I would like to fix this. Also, I checked this on the main branch.

Comment From: devProdigy

@phofl seem that we have a solution, could you please check what was suggested above? :)

Comment From: phofl

This needs a pr and tests before we can take a closer look