Issue Description

I'm experiencing random data loss in my Redis cluster setup. Here's a detailed breakdown of the scenario:

Setup

  • Local Redis cluster environment
  • NestJS application
  • Main async function containing 13 sub-functions
  • Each function handles setting ~100,000 records to Redis in a loop

Observed Behavior

  • All data successfully passes through the application logic and reaches the Redis SET operation
  • Random loss of approximately 100-1,000 records
  • No pattern in which records are lost

Environment Details

  • Each cluster node allocated:
  • 1 core CPU
  • 600MB memory
  • Current memory usage: 100-200MB per node (well within limits)
  • No network connectivity issues observed

Current Configuration

environment.clusterMode
    ? new Redis.Cluster(
          [
              {
                  host: environment.redisCluster.clusterHost,
                  port: parseInt(environment.redisCluster.clusterPort),
              },
          ],
          {
              redisOptions: {
                  username: environment.redisCluster.clusterUsername,
                  password: environment.redisCluster.clusterPassword,
              },
              maxRedirections: 300,
              retryDelayOnFailover: 300,
          },
      )
    : new Redis({
          host: environment.redisHost,
          port: parseInt(environment.redisPort),
      })

Troubleshooting Done

  1. Verified all data reaches the SET operation (no logical errors in application code)
  2. Confirmed adequate memory allocation
  3. Monitored cluster performance during operations
  4. Checked network stability

Any insights or suggestions would be greatly appreciated!

Comment From: sundb

@harish18092002 dow did you confirm the data loss? can you try turning on aof and turning off aof-use-rdb-preamble, and then confirm the data in aof in case of data loss?

Comment From: harish18092002

@harish18092002 dow did you confirm the data loss? can you try turning on aof and turning off aof-use-rdb-preamble, and then confirm the data in aof in case of data loss?

To verify the data loss issue, I implemented a comprehensive validation process. First, I collected and counted all keys from each Redis cluster node. Then, I retrieved all corresponding IDs from the database and performed a comparison between the Redis keys and database IDs. Through this comparison, I identified specific keys that were missing from Redis. To double-check these findings, I attempted to fetch data directly from Redis using these missing keys, which returned null values, confirming the data loss.

Comment From: sundb

@harish18092002 I still recommend that you verify how the key disappeared by opening the aof.

Comment From: harish18092002

@harish18092002 I still recommend that you verify how the key disappeared by opening the aof.

I use both aof and rdb for backup purposes.