The dataset named 'changes' was obtained from a merge by RID:

changes.head()

DX_m12 RID DX_m12
0 1 3 1
1 0 4 0
2 0 6 0
3 0 8 0
4 1 10 1

I've used the two following commands with the intention of identifying lines in which DX_bl and DX_m12 are different:

print(changes[~changes.duplicated(subset = ['DX_bl','DX_m12'], keep=False)])

DX_bl RID DX_m12
64 64 1 167

print(changes.drop_duplicates(subset = ['DX_bl','DX_m12']))

DX_bl RID DX_m12
0 0 1 3
1 1 0 4
10 10 0 30
64 64 1 167

If keep=False, it should return lines 10 (RID 30) and 64 (RID 64). But, as we can see, it loses the information of line 10 (RID 30).

On the other hand, if 'keep' is let in default option (keep = 'first') it wrongly returns lines 0 (RID 3) and 1 (RID 4).

Is there a bug in pandas' duplicated/drop_duplicates??

Comment From: jreback

pls provide a copy-pastable (runnable) example, pics are not useful.

Comment From: jreback

if you have an example pls post.