Describe the bug

Start with a 6 server cluster running 6.2.14 Upgrade one of the replicas to 7.2.2

The 7.2.2 replica comes up and reports 'Cluster state changed: fail'

18589:C 20 Oct 2023 11:31:37.765 # WARNING: Changing databases number from 16 to 1 since we are in cluster mode
18589:C 20 Oct 2023 11:31:37.765 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
18589:C 20 Oct 2023 11:31:37.765 * Redis version=7.2.2, bits=64, commit=00000000, modified=0, pid=18589, just started
18589:C 20 Oct 2023 11:31:37.765 * Configuration loaded
18589:M 20 Oct 2023 11:31:37.766 * Increased maximum number of open files to 10032 (it was originally set to 256).
18589:M 20 Oct 2023 11:31:37.766 * monotonic clock: POSIX clock_gettime
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 7.2.2 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in cluster mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6706
 |    `-._   `._    /     _.-'    |     PID: 18589
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           https://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

18589:M 20 Oct 2023 11:31:37.766 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
18589:M 20 Oct 2023 11:31:37.766 * Node configuration loaded, I'm b0083dead5440a15ec90ec699447047c940d2c3e
18589:M 20 Oct 2023 11:31:37.767 * Server initialized
18589:M 20 Oct 2023 11:31:37.767 . The AOF directory appendonlydir doesn't exist
18589:M 20 Oct 2023 11:31:37.768 * Loading RDB produced by version 7.2.2
18589:M 20 Oct 2023 11:31:37.768 * RDB age 62 seconds
18589:M 20 Oct 2023 11:31:37.768 * RDB memory usage when created 1.86 Mb
18589:M 20 Oct 2023 11:31:37.768 * Done loading RDB, keys loaded: 0, keys expired: 0.
18589:M 20 Oct 2023 11:31:37.768 * DB loaded from disk: 0.001 seconds
18589:M 20 Oct 2023 11:31:37.768 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
18589:M 20 Oct 2023 11:31:37.768 * Ready to accept connections tcp
18589:M 20 Oct 2023 11:31:37.768 . 0 clients connected (0 replicas), 1813552 bytes in use
18589:S 20 Oct 2023 11:31:37.768 * Discarding previously cached master state.
18589:S 20 Oct 2023 11:31:37.768 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
18589:S 20 Oct 2023 11:31:37.768 * Connecting to MASTER 127.0.0.1:6703
18589:S 20 Oct 2023 11:31:37.768 * MASTER <-> REPLICA sync started
18589:S 20 Oct 2023 11:31:37.768 * Cluster state changed: ok
18589:S 20 Oct 2023 11:31:37.768 . Connecting with Node cdd6af2b8f9bd059e94f7b1803e17264d69b727a at 127.0.0.1:16701
18589:S 20 Oct 2023 11:31:37.768 . Connecting with Node 18c9fa4f122f8952303453b909ac747e0a5405b8 at 127.0.0.1:16703
18589:S 20 Oct 2023 11:31:37.769 . Connecting with Node 485f875d20fd8601cf6d9aec86fcd6afadbf1d03 at 127.0.0.1:16704
18589:S 20 Oct 2023 11:31:37.769 . Connecting with Node 03ceb5001225b9c7c0fcb90811a94e6c0b325485 at 127.0.0.1:16705
18589:S 20 Oct 2023 11:31:37.769 . Connecting with Node 8a79aecca073c6adc762e7b43b93be90a480894c at 127.0.0.1:16702
18589:S 20 Oct 2023 11:31:37.769 * Non blocking connect for SYNC fired the event.
18589:S 20 Oct 2023 11:31:37.769 * Master replied to PING, replication can continue...
18589:S 20 Oct 2023 11:31:37.769 * Trying a partial resynchronization (request 78c13bc61e6cca6531d0c35e117d7f6adc2fae8b:1457).
18589:S 20 Oct 2023 11:31:37.769 * Successful partial resynchronization with master.
18589:S 20 Oct 2023 11:31:37.769 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
18589:S 20 Oct 2023 11:31:37.808 - Accepting cluster node connection from 127.0.0.1:49914
18589:S 20 Oct 2023 11:31:37.808 - Accepting cluster node connection from 127.0.0.1:49912
18589:S 20 Oct 2023 11:31:37.808 - Accepting cluster node connection from 127.0.0.1:49913
18589:S 20 Oct 2023 11:31:37.808 - Accepting cluster node connection from 127.0.0.1:49915
18589:S 20 Oct 2023 11:31:37.808 . --- Processing packet of type ping, 2568 bytes
18589:S 20 Oct 2023 11:31:37.808 . ping packet received: 8a79aecca073c6adc762e7b43b93be90a480894c
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP cdd6af2b8f9bd059e94f7b1803e17264d69b727a 127.0.0.1:6701@16701 master
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 485f875d20fd8601cf6d9aec86fcd6afadbf1d03 127.0.0.1:6704@16704 slave
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 18c9fa4f122f8952303453b909ac747e0a5405b8 127.0.0.1:6703@16703 master
18589:S 20 Oct 2023 11:31:37.808 . --- Processing packet of type ping, 2568 bytes
18589:S 20 Oct 2023 11:31:37.808 . ping packet received: cdd6af2b8f9bd059e94f7b1803e17264d69b727a
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 03ceb5001225b9c7c0fcb90811a94e6c0b325485 127.0.0.1:6705@16705 slave
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP b0083dead5440a15ec90ec699447047c940d2c3e 127.0.0.1:6706@16706 slave,fail
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 18c9fa4f122f8952303453b909ac747e0a5405b8 127.0.0.1:6703@16703 master
18589:S 20 Oct 2023 11:31:37.808 . --- Processing packet of type ping, 2568 bytes
18589:S 20 Oct 2023 11:31:37.808 . ping packet received: 18c9fa4f122f8952303453b909ac747e0a5405b8
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 8a79aecca073c6adc762e7b43b93be90a480894c 127.0.0.1:6702@16702 master
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP b0083dead5440a15ec90ec699447047c940d2c3e 127.0.0.1:6706@16706 slave,fail
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 485f875d20fd8601cf6d9aec86fcd6afadbf1d03 127.0.0.1:6704@16704 slave
18589:S 20 Oct 2023 11:31:37.808 . --- Processing packet of type ping, 2568 bytes
18589:S 20 Oct 2023 11:31:37.808 . ping packet received: 485f875d20fd8601cf6d9aec86fcd6afadbf1d03
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 8a79aecca073c6adc762e7b43b93be90a480894c 127.0.0.1:6702@16702 master
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 18c9fa4f122f8952303453b909ac747e0a5405b8 127.0.0.1:6703@16703 master
18589:S 20 Oct 2023 11:31:37.808 . GOSSIP 03ceb5001225b9c7c0fcb90811a94e6c0b325485 127.0.0.1:6705@16705 slave
18589:S 20 Oct 2023 11:31:37.818 - Accepting cluster node connection from 127.0.0.1:49916
18589:S 20 Oct 2023 11:31:37.818 . --- Processing packet of type ping, 2568 bytes
18589:S 20 Oct 2023 11:31:37.818 . ping packet received: 03ceb5001225b9c7c0fcb90811a94e6c0b325485
18589:S 20 Oct 2023 11:31:37.818 . GOSSIP cdd6af2b8f9bd059e94f7b1803e17264d69b727a 127.0.0.1:6701@16701 master
18589:S 20 Oct 2023 11:31:37.818 . GOSSIP 485f875d20fd8601cf6d9aec86fcd6afadbf1d03 127.0.0.1:6704@16704 slave
18589:S 20 Oct 2023 11:31:37.818 . GOSSIP 8a79aecca073c6adc762e7b43b93be90a480894c 127.0.0.1:6702@16702 master
18589:S 20 Oct 2023 11:31:42.817 . 1 clients connected (0 replicas), 1810592 bytes in use
18589:S 20 Oct 2023 11:31:47.860 . 1 clients connected (0 replicas), 1852224 bytes in use
18589:S 20 Oct 2023 11:31:52.810 . *** NODE cdd6af2b8f9bd059e94f7b1803e17264d69b727a possibly failing
18589:S 20 Oct 2023 11:31:52.811 . *** NODE 18c9fa4f122f8952303453b909ac747e0a5405b8 possibly failing
18589:S 20 Oct 2023 11:31:52.811 . *** NODE 485f875d20fd8601cf6d9aec86fcd6afadbf1d03 possibly failing
18589:S 20 Oct 2023 11:31:52.811 . *** NODE 8a79aecca073c6adc762e7b43b93be90a480894c possibly failing
18589:S 20 Oct 2023 11:31:52.811 # Cluster state changed: fail
18589:S 20 Oct 2023 11:31:52.836 . I/O error reading from node link: connection closed
18589:S 20 Oct 2023 11:31:52.838 . I/O error reading from node link: connection closed
18589:S 20 Oct 2023 11:31:52.838 . I/O error reading from node link: connection closed
18589:S 20 Oct 2023 11:31:52.840 . I/O error reading from node link: connection closed
18589:S 20 Oct 2023 11:31:52.842 . I/O error reading from node link: connection closed
18589:S 20 Oct 2023 11:31:52.912 . 1 clients connected (0 replicas), 1840064 bytes in use
18589:S 20 Oct 2023 11:31:52.912 . *** NODE 03ceb5001225b9c7c0fcb90811a94e6c0b325485 possibly failing

The view from the 7.2.2 node:

./redis-7.2.2/src/redis-cli -c -p 6706
127.0.0.1:6706> cluster info
cluster_state:fail
cluster_slots_assigned:16384
cluster_slots_ok:0
cluster_slots_pfail:16384
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:3
cluster_stats_messages_ping_sent:97
cluster_stats_messages_pong_sent:100
cluster_stats_messages_sent:197
cluster_stats_messages_ping_received:100
cluster_stats_messages_fail_received:1
cluster_stats_messages_received:101
total_cluster_links_buffer_limit_exceeded:0
127.0.0.1:6706> cluster nodes
cdd6af2b8f9bd059e94f7b1803e17264d69b727a 127.0.0.1:6701@16701 master,fail? - 1697814814676 1697814814674 1 connected 0-5460
b0083dead5440a15ec90ec699447047c940d2c3e 127.0.0.1:6706@16706 myself,slave 18c9fa4f122f8952303453b909ac747e0a5405b8 0 1697814814674 3 connected
485f875d20fd8601cf6d9aec86fcd6afadbf1d03 127.0.0.1:6704@16704 slave,fail? cdd6af2b8f9bd059e94f7b1803e17264d69b727a 1697814814676 1697814814674 1 connected
8a79aecca073c6adc762e7b43b93be90a480894c 127.0.0.1:6702@16702 master,fail? - 1697814814676 1697814814674 2 connected 5461-10922
03ceb5001225b9c7c0fcb90811a94e6c0b325485 127.0.0.1:6705@16705 slave,fail? 8a79aecca073c6adc762e7b43b93be90a480894c 1697814814676 1697814814674 2 connected
18c9fa4f122f8952303453b909ac747e0a5405b8 127.0.0.1:6703@16703 master,fail? - 1697814814677 1697814814674 3 connected 10923-16383

Output from the 6.2.14 master:

$ ./redis-6.2.14/src/redis-cli -c -p 6703
127.0.0.1:6703> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:3
cluster_stats_messages_ping_sent:672
cluster_stats_messages_pong_sent:666
cluster_stats_messages_meet_sent:1
cluster_stats_messages_sent:1339
cluster_stats_messages_ping_received:705
cluster_stats_messages_pong_received:672
cluster_stats_messages_fail_received:1
cluster_stats_messages_received:1378
127.0.0.1:6703> cluster nodes
18c9fa4f122f8952303453b909ac747e0a5405b8 127.0.0.1:6703@16703 myself,master - 0 1697815410000 3 connected 10923-16383
485f875d20fd8601cf6d9aec86fcd6afadbf1d03 127.0.0.1:6704@16704 slave cdd6af2b8f9bd059e94f7b1803e17264d69b727a 0 1697815410529 1 connected
03ceb5001225b9c7c0fcb90811a94e6c0b325485 127.0.0.1:6705@16705 slave 8a79aecca073c6adc762e7b43b93be90a480894c 0 1697815412549 2 connected
cdd6af2b8f9bd059e94f7b1803e17264d69b727a 127.0.0.1:6701@16701 master - 0 1697815412000 1 connected 0-5460
8a79aecca073c6adc762e7b43b93be90a480894c 127.0.0.1:6702@16702 master - 0 1697815411000 2 connected 5461-10922
b0083dead5440a15ec90ec699447047c940d2c3e 127.0.0.1:6706@16706 slave,fail 18c9fa4f122f8952303453b909ac747e0a5405b8 1697814807018 1697814800956 3 connected

To reproduce

Create a basic 6 server cluster running 6.2, restart one node running 7.2.2

Each node has a config similar to this with different directories and ports:

$ cat node1/redis.conf
dir /Users/work/working/redis/node1
cluster-enabled yes
cluster-config-file /Users/work/working/redis/node1/cluster.nodes.conf
protected-mode no
port 6701

Cluster was created at 6.2.14 with this command:

./redis-6.2.14/src/redis-cli --cluster create 127.0.0.1:6701 127.0.0.1:6702 127.0.0.1:6703 127.0.0.1:6704 127.0.0.1:6705 127.0.0.1:6706 --cluster-replicas 1

Expected behavior

Cluster fail shouldn't be reported on server running 7.2.

Comment From: madolson

As a brief update, it's a little bit of a complex issue. We introduced away to extend the clusterbus protocol to add some new functions, however we didn't fully implement backwards compatibility with older versions. I'm working on a patch and will post it when I have some time.

As a workaround, you should be able to upgrade from 6.2 -> 7.0 and then from 7.0 to 7.2 without any issue.

Comment From: jdork0

Is there any update on this issue and the possibility of getting a fix in 7.2 sometime?

Comment From: ackerL

Hi team, we are facing the similar issue when upgrade Redis from version 5.x to 7.2.x in Redis cluster. Just want to know is there any plan for the fix of the compatibility issue?

Comment From: sushistack

i tried to redis 5.x -> 7.2.4 directly, but faced sample problem.

I can upgrade redis version from redis 5.x -> redis 6.2.x -> redis 7.0.15 -> redis 7.2.4 but i think redis 6.2.x is not necessary. (redis 5.x -> redis 7.0.x -> redis 7.2.x or upper version)

thank you, @madolson

Comment From: madolson

Don't work on this project anymore sorry.