So I have this series of integers shown below
from pandas import Series
s = Series([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
And I want to see how many times the numbers changes over the series, so I compare two slices of the same string to one another.
s[1:].ne(s[:-1])
Out[4]:
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
30 False
31 False
32 False
33 False
34 False
35 False
36 False
37 False
38 False
39 True
dtype: bool
Not only does the output using the Series.ne
method not make any logical sense to me but the output is also longer than either of the inputs which is especially confusing.
I think this might be related to this https://github.com/pandas-dev/pandas/issues/1134
Apologies if this isn't an issue but I haven't been able to find any satisfactory explanation for this behavior
tl;dr:
Where s
is a pandas.Series of int's
[i != s[:-1][idx] for idx, i in enumerate(s[1:])] != s[:-1].ne(s[1:]).tolist()
Comment From: jreback
This was changed here: https://github.com/pandas-dev/pandas/pull/13894
use a shorter sequence just to make this fit in a reasonable amount of space
In [38]: s = s[0:5]
In [39]: s
Out[39]:
0 1
1 2
2 3
3 1
4 2
dtype: int64
In [40]: a = s[1:]
In [41]: b = s[:-1]
In [42]: a
Out[42]:
1 2
2 3
3 1
4 2
dtype: int64
In [43]: b
Out[43]:
0 1
1 2
2 3
3 1
dtype: int64
these align, meaning that they are reindexed to the union of the series
In [44]: a, b = a.align(b)
In [45]: a
Out[45]:
0 NaN
1 2.0
2 3.0
3 1.0
4 2.0
dtype: float64
In [46]: b
Out[46]:
0 1.0
1 2.0
2 3.0
3 1.0
4 NaN
dtype: float64
The comparison is then clear (== for clarity here)
In [47]: a == b
Out[47]:
0 False
1 True
2 True
3 True
4 False
dtype: bool