Code to replicate problem

Please see this link to download the referenced files.

df = pd.read_pickle('bug_df.pickle')
df2 = pd.read_pickle('bug_s2c.pickle')

try:
    assert(df.equals(df2.loc[df.index]))
except AssertionError:
    print("This fails - but why?")
    pass

assert(set(df2.index.values) == set(df.index.values)) # Indexes have the same values, but maybe in different order

df_list = df.values.tolist()
df2_list = df2.loc[df.index].values.tolist() # Ensure index is in the same order...
vc = [x == y for (x, y) in zip(df_list, df2_list)] # vc is list of lists...

try:
    assert(all(vc))
except AssertionError:
    print("This fails, so must be different values. But which one...?")
    pass

idx = vc.index(False)
print(df_list[idx])
print(df2_list[idx])

print("Hmmm... lists look the same...")

print(len(df2_list[idx]) == len(df_list[idx])) # Check length equality

df2_el = df2_list[idx]
df_el = df_list[idx]

el_comp = [x == y for (x,y) in zip(df2_el, df_el)]
print(el_comp) # First element is different
el_diff = el_comp.index(False)
print("Clearly its the first element that is different, but the first element in" + \
      " both lists are NaNs - pandas is meant to treat them as equal.")

print("Are the first elements even NaNs...?")
print(pd.isna(df_el[el_diff]))
print(pd.isna(df2_el[el_diff]))
print("... so yes, they are apprently are. How is this possible...?")

This produces the output:

This fails - but why?
This fails, so must be different values. But which one...?
[nan, nan, nan, nan, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[nan, nan, nan, nan, nan, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Hmmm... lists look the same...
True
[False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]
Clearly its the first element that is different, but the first element in both lists are NaNs - pandas is meant to treat them as equal.
Are the first elements even NaNs...?
True
True
... so yes, they are apprently are. How is this possible...?

Problem description

With the DataFrame.equals() method, pandas seems to be failing to identify NaNs as equal, as stated by the documentation.

The problem occurs with Pandas v0.22.

Expected Output

There should be no failure of assertion at the first assert statement. Equivalently, This fails - but why? should not be printed.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-112-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0 pytest: 3.0.3 pip: 9.0.1 setuptools: 28.8.0 Cython: 0.25.1 numpy: 1.11.2 scipy: 0.18.1 pyarrow: None xarray: None IPython: 5.1.0 sphinx: 1.4.8 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2016.7 blosc: None bottleneck: 1.1.0 tables: 3.3.0 numexpr: 2.6.1 feather: None matplotlib: 1.5.3 openpyxl: 2.4.9 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.8.0 bs4: 4.5.1 html5lib: 1.0b10 sqlalchemy: 1.1.3 pymysql: None psycopg2: None jinja2: 2.8 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

these aren't even close to being equal, they have different dtypes as well as some different values. you need a simple example.

In [19]: tm.assert_frame_equal(pd.read_pickle('bug_df.pickle'), pd.read_pickle('bug_s2c.pickle'))
DataFrame.index values are different (100.0 %)
[left]:  Index(['S', '1 SheepCattle', '2 DairyCattle', '3 OtherAnimals', '4 Crops',
       '5 OtherAg', '6 FishHuntTrap', '7 ForestryLogs', '8 AgSrv', '9 Coal',
       '10 Oil', '11 GAS', '12 LNG', '13 IronOre', '14 NonFeOres',
       '15 NonMetMins', '16 MiningSrv', '17 MeatProds', '18 DairyProds',
       '19 OtherFood', '20 Beverages', '21 TCF', '22 WoodProds',
       '23 PulpPaper', '24 Printing', '25 RefineProd', '26 Chemicals',
       '27 PlasticRub', '28 NonMetalMin', '29 CementLime', '30 IronSteel',
       '31 Aluminium', '32 OtherNonFeMt', '33 MetalProds', '34 MVPOtherTran',
       '35 OtherEquip', '36 OtherMan', '37 ElecCoal', '38 ElecGas',
       '39 ElecHydro', '40 ElecOther', '41 ElecNuclear', '42 ElecSupply',
       '43 GasSupply', '44 WaterDrains', '45 ResidCons', '46 NonResidCons',
       '47 ConsSrv', '48 WholeTrade', '49 RetailTrade', '50 AccomFood',
       '51 RoadFreight', '52 RoadPass', '53 RailFreight', '54 RailPass',
       '55 Pipeline', '56 WaterTrans', '57 AirTrans', '58 Commun',
       '59 Banking', '60 Finance', '61 Insurance', '62 DwellingLow',
       '63 DwellingHigh', '64 Rental', '65 RealEstate', '66 OthBusServ',
       '67 PubAdminReg', '68 Defence', '69 Education', '70 HealthSrv',
       '71 ResidCare', '72 Culture', '73 Gambling', '74 Repairs',
       '75 OtherSrv', '76 PrivTranServ'],
      dtype='object')
[right]: Index(['1 SheepCattle', '10 Oil', '11 GAS', '12 LNG', '13 IronOre',
       '14 NonFeOres', '15 NonMetMins', '16 MiningSrv', '17 MeatProds',
       '18 DairyProds', '19 OtherFood', '2 DairyCattle', '20 Beverages',
       '21 TCF', '22 WoodProds', '23 PulpPaper', '24 Printing',
       '25 RefineProd', '26 Chemicals', '27 PlasticRub', '28 NonMetalMin',
       '29 CementLime', '3 OtherAnimals', '30 IronSteel', '31 Aluminium',
       '32 OtherNonFeMt', '33 MetalProds', '34 MVPOtherTran', '35 OtherEquip',
       '36 OtherMan', '37 ElecCoal', '38 ElecGas', '39 ElecHydro', '4 Crops',
       '40 ElecOther', '41 ElecNuclear', '42 ElecSupply', '43 GasSupply',
       '44 WaterDrains', '45 ResidCons', '46 NonResidCons', '47 ConsSrv',
       '48 WholeTrade', '49 RetailTrade', '5 OtherAg', '50 AccomFood',
       '51 RoadFreight', '52 RoadPass', '53 RailFreight', '54 RailPass',
       '55 Pipeline', '56 WaterTrans', '57 AirTrans', '58 Commun',
       '59 Banking', '6 FishHuntTrap', '60 Finance', '61 Insurance',
       '62 DwellingLow', '63 DwellingHigh', '64 Rental', '65 RealEstate',
       '66 OthBusServ', '67 PubAdminReg', '68 Defence', '69 Education',
       '7 ForestryLogs', '70 HealthSrv', '71 ResidCare', '72 Culture',
       '73 Gambling', '74 Repairs', '75 OtherSrv', '76 PrivTranServ',
       '8 AgSrv', '9 Coal', 'S'],
      dtype='object')

Comment From: ghcn

This is not a bug. nan == nan is False.

Comment From: charlie0389

Read the documentation (v0.22):

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

Pandas Failure to identify NaNs as equal in pandas.DataFrame.equals() method (as it should)

Code to replicate problem

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Output of `pd.show_versions()`