Describe the bug

We are using this cache in conjunction with our Grafana Loki deployment handling roughly 100-200GB of uncompressed logs every day. This causes issues for the cache reading from master. The redis cluster handles the caching of compressed logs which should work to up to 100GB of throughput (well over the daily amount of 100GB uncompressed logs).

We hit a readiness probe failed error without any helpful error information

The error is as follows

I/O error reading bulk count from MASTER: No error information
RDB: 50 MB of memory used by copy-on-write
Reconnecting to MASTER xxx.xxx.xxx.xxx:6379 after failure
MASTER <-> REPLICA sync started
Non blocking connect for SYNC fired the event.
Master replied to PING, replication can continue...
Partial resynchronization not possible (no cached master)

The main issue for us is the No error information part

There is no way to debug this issue with this kind of response message

To reproduce

We use Kubernetes pods with the spotahome/redis-operator

The failover has some CustomConfig that will override the default values set by the operator (see below in additional information around CustomConfig). We run 4 instances that have with 9 pod across them. We request 3 cores and 35GB of memory per pod

Expected behavior

We expect one of two scenarios to occur.

  1. The pod to fail with an error message that can help us to change config to improve performance
  2. The pod to either not fail the readiness probe or restart the pod on the occurrence of this error message (you might not be able to help with this one, as we use the redis-operator)

Additional information

We have the Persistent Volume Claim set to the size of 256GB. This should be more than enough data to hold the searched data for any timeframe. CustomConfig set in the Redis Failover

"repl-timeout 610"
"save 60 5000"
"tcp-keepalive 610"
"maxclients 500000"
"oom-score-adj yes"
"oom-score-adj-values 0 200 800"
"dynamic-hz yes"

Comment From: vineelyalamarthy

is this Redis Cluster or sentinel?

Comment From: seanocca

is this Redis Cluster or sentinel?

This error comes up on the redis cluster