Redis [BUG] - Nineya|java/go/python

Describe the bug

Problems caused by watch a key that is about to expire To reproduce

   String loadData = "123456";//Simulate time-consuming loading action. Redis connection is not occupied at this time

   Jedis j85 = new Jedis("xx.xx.xx.xx",xxx);
    j85.setex("testbit", 10,  "6543");//Assume the last setting

    j85.watch("testbit");

    //TODO: Some conditions are used to determine whether to execute the transaction

    Transaction trans1 = j85.multi();

    Thread.sleep(11000);//Hold for 11 seconds to disable key‘testbit’

    //Another thread simulates concurrent behavior
    new Thread(()-> { 
        Jedis j85t = new Jedis("xx.xx.xx.xx",xxx);
        j85t.del("testbit");//The purpose of deletion is to keep the transaction from executing
    }).start();

    trans1.set("testbit", loadData);
    trans1.exec();

Expected behavior

I expect that the transaction will not succeed, although key‘testbit’ has expired Additional information

In fact, my goal is to lazily load new data after each deletion.

Comment From: sundb

1) I'm wondering if the code you posted was copied from two places. new Thread(() would be blocked by Thread.sleep(11000).

2) Is key testbit exist before j85.setex("testbit", 10, "6543");? If it doesn't exist, DEL command in another thread won't change anything, and it won't trigger the WATCH.

Comment From: whipfeng

I'm wondering if the code you posted was copied from two places. new Thread(() would be blocked by Thread.sleep(11000).

Is key testbit exist before j85.setex("testbit", 10, "6543");? If it doesn't exist, DEL command in another thread won't change anything, and it won't trigger the WATCH.

Are you Chinese? Can we communicate in Chinese

Comment From: whipfeng

I'm wondering if the code you posted was copied from two places. new Thread(() would be blocked by Thread.sleep(11000).

Is key testbit exist before j85.setex("testbit", 10, "6543");? If it doesn't exist, DEL command in another thread won't change anything, and it won't trigger the WATCH.

1.‘new Thread(()’ and ‘Thread.sleep(11000)’ just to simulate concurrency. 2.Does it have any influence whether the key testbit exists before 'j85.setex("testbit", 10, "6543");'. I just tried. Even if it exists, ‘DEL’ will not trigger the WATCH. Isn't Redis official saying that it only depends on whether the command is modified rather than the comparison value？

Comment From: sundb

It's better to use English to let more guys know what you ask. WATCH can't use in multi, so WATCH should be executed before MULTI command. ```java j85.watch("testbit"); j85.exec(); j85.setex("testbit", 10, "6543");//Assume the last setting Transaction trans1 = j85.multi();


**Comment From: whipfeng**

> so `WATCH` should be executed before `MULTI` command.

Look at my code. That's what I do.



**Comment From: sundb**

@whipfeng sorry for my missing.
Which Redis version are you using?

**Comment From: whipfeng**

> @whipfeng sorry for my missing. Which Redis version are you using?

I tested it on ‘redis_version:3.2.6’.
But I looked at the source code as if every version of the branch was handled the same way.
The following reference is the code of the delete command on git branch 3.2（Please see my comments）:

        if (dbDelete(c->db,c->argv[j])) {  //Only if it is successfully deleted here(The basis for successful deletion only exists in the dictionary) ,
            signalModifiedKey(c->db,c->argv[j]);//The status of the client will be notified of modification.
            notifyKeyspaceEvent(NOTIFY_GENERIC,
                "del",c->argv[j],c->db->id);
            server.dirty++;
            deleted++;
        }

I think it should be the same as the set command. The ‘ signalModifiedKey’ method should be called in any case（Even if the original value is the same as the value to be set）:

    void setKey(redisDb *db, robj *key, robj *val) {
        if (lookupKeyWrite(db,key) == NULL) {
            dbAdd(db,key,val);
        } else {
            dbOverwrite(db,key,val);
        }
        incrRefCount(val);
        removeExpire(db,key);
        signalModifiedKey(db,key);
     }

**Comment From: jerrykcode**

The `dbDelete` only failed when the key doesn't exist, I think we should not trigger `signalModifiedKey` when deleting an unexisting key

**Comment From: sundb**

@whipfeng The key expiration does not execute to the code you mentioned, it may have expired, but is still in memory and not deleted until it is used again or is cleaned up periodically.
This issue should be resolved at https://github.com/redis/redis/pull/9194, but it only works after 7.x

**Comment From: whipfeng**

> The `dbDelete` only failed when the key doesn't exist, I think we should not trigger `signalModifiedKey` when deleting an unexisting key

Let's jump out of the specific code implementation.
I think the del command is a special form of the set command，
Just like 'j85.set ("testbit", null)'.（Although this is not allowed to be written, it is not appropriate only for the 'String' type）
Just because using the set command, the `signalModifiedKey` method will be called if the original value is the same as the value to be set.



**Comment From: whipfeng**

> @whipfeng The key expiration does not execute to the code you mentioned, it may have expired, but is still in memory and not deleted until it is used again or is cleaned up periodically. This issue should be resolved at #9194, but it only works after 7.x

I think whether the deletion is due to expiration or not, it should be notified.

**Comment From: sundb**

@whipfeng It's a waste to compare these two values if we wanna notify when the value is changed, .e.g big value.

**Comment From: whipfeng**

> @whipfeng It's a waste to compare these two values if we wanna notify when the value is changed, .e.g big value.

So we should only focus on the command itself, not on the value of the operation, because each command operation implies the same version number as the ABA problem.
In addition, the nonexistent key is equivalent to 'null mapped by key'.

**Comment From: jerrykcode**

@whipfeng I think this bug is not related to `dbDelete`, it should be resolved by [#9194](https://github.com/redis/redis/pull/9194), as @sundb mentioned.

But maybe we need to discuss this:

> Just because using the set command, the signalModifiedKey method will be called if the original value is the same as the value to be set.

Since the `set` command triggers signalModifiedKey even when the value is not changed (nothing in the database altered),

so should we change the feature to let the `del` command also triggers signalModifiedKey even when `del` failed (nothing in the database altered) ?



**Comment From: sundb**

@jerrykcode  `DEL` non-existent key and `SET` same value are not the same concept at all.
1) When `DEL` a key that does not exist, we know explicitly that it does not exist, so it makes no sense at all to trigger `signalModifiedKey()`.
2) When`SET` the same value, the key is explicitly present, and in any way, we overwrite the old value with the new one, even if they are the same.

`signalModifiedKey()` should be based on the premise that the key has existed before or that it has been modified.

**Comment From: jerrykcode**

@sundb Ok, I see. Thanks for your explanation :)

**Comment From: whipfeng**

> @jerrykcode `DEL` non-existent key and `SET` same value are not the same concept at all.
> 
> 1. When `DEL` a key that does not exist, we know explicitly that it does not exist, so it makes no sense at all to trigger `signalModifiedKey()`.
> 2. When`SET` the same value, the key is explicitly present, and in any way, we overwrite the old value with the new one, even if they are the same.
> 
> `signalModifiedKey()` should be based on the premise that the key has existed before or that it has been modified.

How do we implement operations like 'set (test, null)'? Don't we use the del command？

**Comment From: whipfeng**

I wrote another complex example based on Redis cluster：

String getSetIfNull(final String key, final int seconds, final Supplier supplier) { List verAndVal = hmget(key, "version", "value"); String ver = verAndVal.get(0); String val = verAndVal.get(1); if (null != val) {// Direct use of cache with value return val; } final String nVer; if (null == ver) {//If there is no version number, it will be generated, and concurrency will not affect nVer = UUID.randomUUID().toString(); if (seconds > 0) {//Put transaction operation if timeout is required new JedisClusterCommand(connectionHandler, maxAttempts) { @Override public Void execute(Jedis connection) { Transaction trans = connection.multi(); trans.hset(key, "version", nVer); trans.expire(key, seconds); trans.exec(); return null; } }.run(key); } else { hset(key, "version", nVer); } } else {//Directly use the existing version, equivalent to assisting in loading nVer = ver; } final String nVal = supplier.get();//External services provide a time-consuming loading method for new values, which does not occupy the Redis connection at this time new JedisClusterCommand(connectionHandler, maxAttempts) { @Override public Void execute(Jedis connection) { connection.watch(key);//It is necessary to monitor whether the key has been deleted during the operation String oVer = connection.hget(key, "version"); if (nVer.equals(oVer)) {//Compare version number Transaction trans = connection.multi(); trans.hset(key, "value", nVal); if (seconds > 0) { trans.expire(key, seconds); } trans.exec(); } else { connection.unwatch(); } return null; } }.run(key); return nVal; }



Among them, the value obtained by `supplier.get()` may change. Each time it changes in another concurrency, the Redis key will be deleted.
After ` if (nVer.equals(oVer))`（true）, the key just expired and was deleted several times in another concurrency. At this point, an old value will be set successfully, although the probability is small.
So how can I guarantee that the value is the last change? Please help me to have a careful look！

**Comment From: whipfeng**

I still think that every modification operation should imply the concept of a version number.
The concept given is accurate, otherwise the premise that the key already exists should be excluded separately.
But it[#9194] seems that doing this only for expired keys can solve my problem, and I don't have a better proof example

**Comment From: sundb**

> How do we implement operations like 'set (test, null)'? Don't we use the del command？

I'm not sure what you are trying to do with this code, is it to delete the `test` key?

**Comment From: whipfeng**

> > How do we implement operations like 'set (test, null)'? Don't we use the del command？
> 
> I'm not sure what you are trying to do with this code, is it to delete the `test` key?

@sundb @jerrykcode I wrote another short example to support my idea:

String getSetIfNull(final String key, final Supplier supplier) { Jedis j85 = new Jedis("xx.xx.xx.xx",0000); String val = j85.get(key); if (null != val) {// Direct use of cache with value return val; }

    j85.watch(key);
    val = j85.get(key);
    if (null != val) {// During the watch process, you need to query and judge again
        j85.unwatch();
        return val;
    }
    val = supplier.get();//Load data from the business. The key will be deleted when the data is changed concurrently
    /**
     * At this time, other concurrent users may modify the value of 'supplier.get()' multiple times
     * and delete the key（However, deletion does not affect the setting of expired data. Is that OK?It seems that only 
     * handling expired keys can't solve this problem）
     * */
    Transaction trans = j85.multi();
    trans.set(key, val);
    trans.exec();
    j85.close();
    return val;
}


The goal is Redis cache and `supplier.get()` final consistency.
Please see the content of code comments, thank you!

**Comment From: sundb**

@whipfeng Not sure i can follow you.
First of all, the `getSetIfNull()` method itself is not thread-safe, and operations on redis are not atomic, so you cannot guarantee that the Redis cachet and `supplier.get()` are consistent.
1) Make sure that supplier.get() is locked for the entire time when it's saved to Redis.
2) Make sure that the key cannot be modified by other concurrently, you may need a distributed lock or use lua or something else.

**Comment From: whipfeng**

@sundb What is the basis of `not thread-safe`?
If you follow the current implementation of the del command, it is true.
Let me change my example. Do you think we can get this conclusion?

String getSetIfNull(final String key, final Supplier supplier) { Jedis j85 = new Jedis("xx.xx.xx.xx",0000); String val = j85.get(key); if ('null_str' != val) {// Direct use of cache with value return val; }

    j85.watch(key);
    val = j85.get(key);
    if ('null_str' != val) {// During the watch process, you need to query and judge again
        j85.unwatch();
        return val;
    }
    val = supplier.get();//Load data from the business(Not allowed as 'null_str'). The key will be set to 'null_str' when the data is changed concurrently
    /**
     * At this time, other concurrent users may modify the value of 'supplier.get()' multiple times
     * and set the key to 'null_str'
     * */
    Transaction trans = j85.multi();
    trans.set(key, val);
    trans.exec();
    j85.close();
    return val;
}


Among them, 'null_str' is considered as a reserved word in business and  `supplier.get()` is not allowed to return 'null_str'.

**Comment From: sundb**

@whipfeng 
1)
```java
        String val = j85.get(key);
        if ('null_str' != val) {    // Is it possible for the key to be modified by other concurrently at this time?
            return val;
        }

        val = j85.get(key);
        if ('null_str' != val) {  // Same as above, key was deleted or changed by other concurrently.
            j85.unwatch();
            return val;
        }

val = supplier.get();    // After get(), is it possible for the supplier to be modified by other concurrently?

Any one of the above 3 types of inconsistencies due to concurrency may cause Redis cache and supplier.get() to end up inconsistent.

Comment From: whipfeng

@sundb This transient situation is no problem. What we want is final consistency. For those who use this method, they can finally perceive that the data has changed. And my example only disproves the problem of del command in the previous example.

Comment From: sundb

@whipfeng Not sure i fully understand what you meant. You mean there won't be two concurrent or 2 threads calling this method at the same time? If not, then there must be a concurrency or threading problem.

like the following example(thread1 and thread2)

thread1: 
1) change supplier to s1, then call getSetIfNull()
2) run to `val = supplier.get();` and set val to s1

thread2:
1) change supplier  to s2, then call getSetIfNull()
2) run to `val = supplier.get();` and set val to s2
3) update the key in redis with s2.

thread1:
3) update the key in redis with s1.

In fact, the value of the key should be s2, not s1. In any case, as long as there are concurrent reads and writes to the same resource, there is bound to be a race to the bottom, which is why locks are needed.

Comment From: whipfeng

@sundb The concurrency relationship is not so complex,Equivalent to asynchronous cache refresh

thread0: 
1) change supplier to s1
2) set(key,'null_str')

thread1～N: //Just take it out for use
1)  call getSetIfNull()

You can think that supplier.get() is loaded from the database. thread0: change supplier to s1 is an operation to modify the database

Comment From: sundb

So you are now running into a final inconsistency problem that the database and cache are not consistent?

Comment From: whipfeng

The last example I provided does not have this problem. The previous one will have this problem by using the del command. This is also the reason why I give two examples to disprove.

Comment From: sundb

it semms that if you use DEL j85.get(key) would never equal null_str. I'm not expert with JAVA, perhaps you can change if ('null_str' != val) to if (NULL != val) when using DEL command.

Comment From: whipfeng

set(key,'null_str')and del(key) are just two examples for comparison,I just want to say that if there is a problem with the way of del(key)

Comment From: sundb

There will be no problem.

Comment From: whipfeng

Let's look at the original example

String getSetIfNull(final String key, final Supplier<String> supplier) {
        Jedis j85 = new Jedis("xx.xx.xx.xx",0000);
        String val = j85.get(key);
        if (null != val) {// Direct use of cache with value
            return val;
        }

        j85.watch(key);
        val = j85.get(key);
        if (null != val) {// During the watch process, you need to query and judge again
            j85.unwatch();
            return val;
        }
        val = supplier.get();//Load data from the business. The key will be deleted when the data is changed concurrently
        /**
         * At this time, other concurrent users may modify the value of 'supplier.get()' multiple times
         * and delete the key（However, deletion does not affect the setting of expired data. Is that OK?It seems that only 
         * handling expired keys can't solve this problem）
         * */
        Transaction trans = j85.multi();
        trans.set(key, val);
        trans.exec();
        j85.close();
        return val;
    }

This is the concurrency relationship

thread0: 
1) change supplier to s1
2) del(key)

thread1～N: //Just take it out for use
1)  call getSetIfNull()

In this case, final consistency cannot be guaranteed. Is that the problem?

Comment From: sundb

It seems to me that, as in your example above, there are concurrency issues. Because you are calling the set command in both threads.

thread0: 
1) change supplier to s1
2) set(key,'null_str')

thread1～N: //Just take it out for use
1)  call getSetIfNull()

If it were me, I would use a lock.

thread0: 
lock(lockname) {
    1) change supplier to s1
    2) set(key,'null_str')
}

thread1～N: //Just take it out for use
lock(lockname) {
    1)  call getSetIfNull()
}

Comment From: whipfeng

We only need final consistency,And it's enough for us to take the atomic operation of set(key,'null_str') as the boundary point of consistency.It is too heavy to use locks. And locks in distributed environments are inherently 'consistency issues'. We can only discuss the final consistency, because even if you use the lock, thread1～N will be inconsistent in a short period of time after the system scheduling delay.

Comment From: sundb

If you don't wanna use locks, then you have to protect your operations from atomicity. Why would you want to do a del instead of updating the cache directly? Like following:

updateCache() {
    getSupplier();
    multi();
    updateSupplier();
    exec();
}

getSetIfNull() {
        String val = j85.get(key);
        if (null != val) {// Direct use of cache with value
            return val;
        }
        updateCache();
}

This ensures that the operation to cache would be atomic.

Comment From: whipfeng

updateSupplier() In the current thread, I don't understand.

I guess that's what you want

updateCache() {
    val = getSupplier();
    multi();
   set(key,val);
    exec();
}

There are still consistency problems. If there is concurrent behavior before getSupplier()to multi(), and the result of getSupplier() is changing, the last setting to the cache is an uncertain value

Comment From: sundb

Both supplier and cache are read and written by multiple threads at the same time, so locks are unavoidable. Or you can make sure that only one thread can do the update, and that the other threads are read-only.

Comment From: whipfeng

1）The watch command of Redis has the concept of optimistic locking,If you put another lock outside, it's meaningless 2）only one thread can do the update,Similar to this,The update is serial, but multiple updates may be fast.

Comment From: sundb

1）The watch command of Redis has the concept of optimistic locking,If you put another lock outside, it's meaningless

But competing resources also include suppler, how do you ensure that supplier.get() has not been modified by other threads?

Comment From: whipfeng

After supplier.get() is modified by another thread, the cache will be deleted by that thread, and subsequent calls to getSetIfNull() will reload the cache, which is why we want to use the watch command.

In addition,short-time inconsistency in distributed scenarios is tolerable, so only final consistency is generally discussed. If you have to add locks on the outer layer, it is unrealistic for all callers from different businesses to use unified distributed locks,and this is based on the ideal situation of distributed locks(which is an issue in itself).

Take the concurrency relationship as an example:

thread0: 
lock(lockname) {
    1) change supplier to s1 //'s1' is a variable
    2) del(key)
}

thread1～N:  //thread1~threadN can come from different processes or distributed nodes
lock(lockname) {
    1)  val = call getSetIfNull()
    2) Business use 'val'
}
3) Business use 'val'

According to you, only 2) Business use 'val'can guarantee absolute consistency. If it is used outside like 3) Business use 'val', there will still be the same problem.

We just need to ensure that the reading after writing is up to date in chronological order.For the caller, the completion of writing is the end of 'thread0' and the start of reading is the start of 'thread1～N'.

Comment From: sundb

A second thought, The code you mentioned in https://github.com/redis/redis/issues/11704#issuecomment-1413363391 looks like it will guarantee final consistency, but I can't finger it out yet, maybe you can provide a more detailed example. On the other hand, you can use the monitor command to verify the process of updating the redis cache.

Comment From: whipfeng

The following is printed by me with the monitor command, and the following comments are printed by the thread I debug

1675399184.811923 [0 10.11.63.39:64804] "DEL" "testbit"   //thread0:supplier 0
1675399187.042392 [0 10.11.63.39:64806] "DEL" "testbit"   //thread0:supplier 01
1675399189.030541 [0 10.11.63.39:64807] "DEL" "testbit"   //thread0:supplier 012
1675399207.603029 [0 10.11.63.39:64802] "GET" "testbit" //thread1:value=null
1675399211.691565 [0 10.11.63.39:64802] "WATCH" "testbit"  //thread1:
1675399222.534349 [0 10.11.63.39:64802] "GET" "testbit"   //thread1:
1675399233.383902 [0 10.11.63.39:64810] "DEL" "testbit"  //thread0:supplier 0123
1675399237.476089 [0 10.11.63.39:64811] "DEL" "testbit"   //thread0:supplier 01234
                                                      //thread1:call 'supplier.get()'
1675399268.017141 [0 10.11.63.39:64814] "DEL" "testbit"    //thread0:supplier 012345
1675399269.577149 [0 10.11.63.39:64815] "DEL" "testbit"   //thread0:supplier 0123456
1675399271.148512 [0 10.11.63.39:64816] "DEL" "testbit"    //thread0:supplier 01234567
1675399281.877108 [0 10.11.63.39:64802] "MULTI"。          //thread1:
1675399281.877210 [0 10.11.63.39:64802] "SET" "testbit" "01234"  //thread1:
1675399281.877383 [0 10.11.63.39:64802] "EXEC"              //thread1:getSetIfNull 01234
1675399289.646186 [0 10.11.63.39:64817] "GET" "testbit" //thread1:value=01234
                                                       //thread1:getSetIfNull 01234
1675399295.299110 [0 10.11.63.39:64818] "GET" "testbit" //thread1:value=01234
                                                       //thread1:getSetIfNull 01234

Comment From: sundb

This is where the cache ends up being inconsistent due to supplier being read and written by multiple threads. WATCH command doesn't work in this case because DEL a non-existent key will not trigger WATCH. If you don't use locks, then you may need a more complex update caching strategy, such as a cache failure queue or others.

thread1: watch key
thread1: get supplier  01234
thread2: update supplie to 01234567
thread1: update cache with 01234

Comment From: whipfeng

WATCH command doesn't work in this case because DEL a non-existent key will not trigger WATCH This is the problem we have been discussing! As @jerrykcode https://github.com/redis/redis/issues/11704#issuecomment-1383690182 said: so should we change the feature to let the del command also triggers signalModifiedKey even when del failed (nothing in the database altered) ? So I gave two examples[del(key) and set(key, 'null_str')] to support my idea. And if we change this feature, it will also simplify the repair logic in https://github.com/redis/redis/pull/9194