Describe the bug

Redis [BUG]Sentinel command 'get-master-addr-by-name' and 'master' return inconsistent

As the picture shows When I use the SENTINEL get-master-addr-by-name command to query the master address, the return is inconsistent with the SENTINEL master

To reproduce

Kill the redis master instance, trigger failover. Issue happens when repeated multiple times

Expected behavior

The return of the two commands should be consistent

Additional information

The sentinel node with this problem entered and exited the tilt mode, I don't know if it is related Redis [BUG]Sentinel command 'get-master-addr-by-name' and 'master' return inconsistent

Comment From: hwware

Hi, I try to reproduce your issues on 4 different machines, one for master, 2 for replicas and 1 for sentinels. And I could not reproduce your issue.

Could you please provide your full info logs, so we could see which OS and which version of Redis when you have this issue. Another question is could you tell us how frequent this problem happens?

Thanks a lot

Comment From: imguang

Sorry for the late reply.

Here is the full log of the node in question: 172.28.204.53-redis.log

And os information: Linux rgibns3 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

How frequent this problem happens? - the last time I switched about 5 times, it hit the problem

I just found that the time of this node is about 40s behind other nodes, I will adjust the time and repeat the test, then sync the test results here

Comment From: imguang

After synchronizing the time and testing again, there is no problem with triggering 100 failovers.

Therefore, it can be basically determined that the problem is caused by inconsistency of cluster time.

I will close this issue, thanks a lot

Comment From: imguang

But in this case, the two commands should return the same too, right?

Comment From: hwware

But in this case, the two commands should return the same too, right?

Yes, from code level, it should return the same value.

Comment From: hwware

After synchronizing the time and testing again, there is no problem with triggering 100 failovers.

Therefore, it can be basically determined that the problem is caused by inconsistency of cluster time.

I will close this issue, thanks a lot

Great