Pandas duplicated removing items that are not duplicates in 0.18

I wish I could say more about this, but it is with proprietary data. Basically, I have a DataFrame of 10 columns and around 7000 rows. When I call "duplicated" on 4 of the rows (3 have strings, 1 has a float), I get around 10 items being flagged as duplicate (5 pairs) when in fact, they are different.

This is sample text of what is being flagged:

          Type   Line                                         LineString  \
1885      else  832.0                (&temp32)->byte1 = (&temp32)->byte4   
1895  do while  832.0                                        do while(0)   
4515      enum  122.0    enum {QQ_ON_UNASSERTED = 0, QQ_ON_ASSERTED = 1}   
4521      enum  167.0          enum {FIELD_RESET = 1, FIELD_TRIPPED = 0}   

              Parameter  Filename  
1885             temp32   arinc.c  
1895                      arinc.c  
4515      QQ_ON_ASSERTED  eeprom.c  
4521       FIELD_TRIPPED  eeprom.c

I know that duplicates are checked through hashing, but is there some way to, I don't know, compare a checksum or a more robust measurement to ensure that only duplicates are flagged? What is the chance that two items could be hashed to the same value?

By adding more of the fields to check the data I am able to prevent false duplicate flags.

I am using Pandas 0.18 from WinPython-64bit-3.4.4.2Qt5.

Comment From: jreback

show what you are actually calling as well as df.info().

Comment From: johnml1135

Here is the lines of code that I run:

    dup_cols = ["Type","Line","LineString","Parameter"]
    pdata3 = pdata2.drop_duplicates(subset=dup_cols)

and here is the result of pdata3.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7120 entries, 0 to 20578
Data columns (total 17 columns):
Block              7120 non-null object
Blocks             7120 non-null object
Filename           7120 non-null object
Impacts            7120 non-null object
ImpactsUnique      7120 non-null object
Level              7120 non-null int64
Line               7120 non-null float64
LineString         7120 non-null object
OwningFunction     7120 non-null object
Parameter          7120 non-null object
ParameterType      7120 non-null object
ParameterUnique    7120 non-null object
Type               7120 non-null object
ParameterFull      7120 non-null object
ParameterField     7120 non-null object
ImpactsFull        7120 non-null object
ImpactsField       7120 non-null object
dtypes: float64(1), int64(1), object(15)
memory usage: 1001.2+ KB

Comment From: johnml1135

I have been looking further into the code and stepping along the pandas code and I found that (predictably) it was my own mistake. There were duplicates, I was just not seeing it. You can close this ticket.