Hello, we've deployed Redis in Kubernetes cluster where all the rdp & node.conf files are stored in PV of class NFS. In the mounted directory, every pod of Redis server creates its own folder which is then used for dumping data & storing of node.conf file for Redis cluster. On going through the source code of Redis, flock is being used to take lock on this before starting its routine. Everything is working fine until one of the underlying Worker node fails and the pods are terminated forcefully. In this case, the lock on file isn't released and when the pod gets rescheduled on another worker we get the following error and the pod gets stuck in crashloop state forever.
Sorry, the cluster configuration file /home/database/redis-chasan-poc2/redis-cluster-server-1/nodes-1.conf is already used by a different Redis Cluster node. Please make sure that different nodes use different cluster configuration files.
To resolve this issue, we are taking a copy of node.conf file and replacing the same file with the original one by deleting it before. This way the lock is released and the newly replaced file is now available for exclusive lock.
Comment From: hasan4791
Any update?