Describe the bug
Problems caused by watch a key that is about to expire To reproduce
String loadData = "123456";//Simulate time-consuming loading action. Redis connection is not occupied at this time
Jedis j85 = new Jedis("xx.xx.xx.xx",xxx);
j85.setex("testbit", 10, "6543");//Assume the last setting
j85.watch("testbit");
//TODO: Some conditions are used to determine whether to execute the transaction
Transaction trans1 = j85.multi();
Thread.sleep(11000);//Hold for 11 seconds to disable key‘testbit’
//Another thread simulates concurrent behavior
new Thread(()-> {
Jedis j85t = new Jedis("xx.xx.xx.xx",xxx);
j85t.del("testbit");//The purpose of deletion is to keep the transaction from executing
}).start();
trans1.set("testbit", loadData);
trans1.exec();
Expected behavior
I expect that the transaction will not succeed, although key‘testbit’ has expired Additional information
In fact, my goal is to lazily load new data after each deletion.
Comment From: sundb
1) I'm wondering if the code you posted was copied from two places.
new Thread(() would be blocked by Thread.sleep(11000).
2) Is key testbit exist before j85.setex("testbit", 10, "6543");?
If it doesn't exist, DEL command in another thread won't change anything, and it won't trigger the WATCH.
Comment From: whipfeng
- I'm wondering if the code you posted was copied from two places.
new Thread(()would be blocked byThread.sleep(11000).- Is key
testbitexist beforej85.setex("testbit", 10, "6543");? If it doesn't exist,DELcommand in another thread won't change anything, and it won't trigger theWATCH.
Are you Chinese? Can we communicate in Chinese
Comment From: whipfeng
- I'm wondering if the code you posted was copied from two places.
new Thread(()would be blocked byThread.sleep(11000).- Is key
testbitexist beforej85.setex("testbit", 10, "6543");? If it doesn't exist,DELcommand in another thread won't change anything, and it won't trigger theWATCH.
1.‘new Thread(()’ and ‘Thread.sleep(11000)’ just to simulate concurrency.
2.Does it have any influence whether the key testbit exists before 'j85.setex("testbit", 10, "6543");'.
I just tried. Even if it exists, ‘DEL’ will not trigger the WATCH.
Isn't Redis official saying that it only depends on whether the command is modified rather than the comparison value?
Comment From: sundb
It's better to use English to let more guys know what you ask.
WATCH can't use in multi, so WATCH should be executed before MULTI command.
```java
j85.watch("testbit");
j85.exec();
j85.setex("testbit", 10, "6543");//Assume the last setting
Transaction trans1 = j85.multi();
**Comment From: whipfeng**
> so `WATCH` should be executed before `MULTI` command.
Look at my code. That's what I do.
**Comment From: sundb**
@whipfeng sorry for my missing.
Which Redis version are you using?
**Comment From: whipfeng**
> @whipfeng sorry for my missing. Which Redis version are you using?
I tested it on ‘redis_version:3.2.6’.
But I looked at the source code as if every version of the branch was handled the same way.
The following reference is the code of the delete command on git branch 3.2(Please see my comments):
if (dbDelete(c->db,c->argv[j])) { //Only if it is successfully deleted here(The basis for successful deletion only exists in the dictionary) ,
signalModifiedKey(c->db,c->argv[j]);//The status of the client will be notified of modification.
notifyKeyspaceEvent(NOTIFY_GENERIC,
"del",c->argv[j],c->db->id);
server.dirty++;
deleted++;
}
I think it should be the same as the set command. The ‘ signalModifiedKey’ method should be called in any case(Even if the original value is the same as the value to be set):
void setKey(redisDb *db, robj *key, robj *val) {
if (lookupKeyWrite(db,key) == NULL) {
dbAdd(db,key,val);
} else {
dbOverwrite(db,key,val);
}
incrRefCount(val);
removeExpire(db,key);
signalModifiedKey(db,key);
}
**Comment From: jerrykcode**
The `dbDelete` only failed when the key doesn't exist, I think we should not trigger `signalModifiedKey` when deleting an unexisting key
**Comment From: sundb**
@whipfeng The key expiration does not execute to the code you mentioned, it may have expired, but is still in memory and not deleted until it is used again or is cleaned up periodically.
This issue should be resolved at https://github.com/redis/redis/pull/9194, but it only works after 7.x
**Comment From: whipfeng**
> The `dbDelete` only failed when the key doesn't exist, I think we should not trigger `signalModifiedKey` when deleting an unexisting key
Let's jump out of the specific code implementation.
I think the del command is a special form of the set command,
Just like 'j85.set ("testbit", null)'.(Although this is not allowed to be written, it is not appropriate only for the 'String' type)
Just because using the set command, the `signalModifiedKey` method will be called if the original value is the same as the value to be set.
**Comment From: whipfeng**
> @whipfeng The key expiration does not execute to the code you mentioned, it may have expired, but is still in memory and not deleted until it is used again or is cleaned up periodically. This issue should be resolved at #9194, but it only works after 7.x
I think whether the deletion is due to expiration or not, it should be notified.
**Comment From: sundb**
@whipfeng It's a waste to compare these two values if we wanna notify when the value is changed, .e.g big value.
**Comment From: whipfeng**
> @whipfeng It's a waste to compare these two values if we wanna notify when the value is changed, .e.g big value.
So we should only focus on the command itself, not on the value of the operation, because each command operation implies the same version number as the ABA problem.
In addition, the nonexistent key is equivalent to 'null mapped by key'.
**Comment From: jerrykcode**
@whipfeng I think this bug is not related to `dbDelete`, it should be resolved by [#9194](https://github.com/redis/redis/pull/9194), as @sundb mentioned.
But maybe we need to discuss this:
> Just because using the set command, the signalModifiedKey method will be called if the original value is the same as the value to be set.
Since the `set` command triggers signalModifiedKey even when the value is not changed (nothing in the database altered),
so should we change the feature to let the `del` command also triggers signalModifiedKey even when `del` failed (nothing in the database altered) ?
**Comment From: sundb**
@jerrykcode `DEL` non-existent key and `SET` same value are not the same concept at all.
1) When `DEL` a key that does not exist, we know explicitly that it does not exist, so it makes no sense at all to trigger `signalModifiedKey()`.
2) When`SET` the same value, the key is explicitly present, and in any way, we overwrite the old value with the new one, even if they are the same.
`signalModifiedKey()` should be based on the premise that the key has existed before or that it has been modified.
**Comment From: jerrykcode**
@sundb Ok, I see. Thanks for your explanation :)
**Comment From: whipfeng**
> @jerrykcode `DEL` non-existent key and `SET` same value are not the same concept at all.
>
> 1. When `DEL` a key that does not exist, we know explicitly that it does not exist, so it makes no sense at all to trigger `signalModifiedKey()`.
> 2. When`SET` the same value, the key is explicitly present, and in any way, we overwrite the old value with the new one, even if they are the same.
>
> `signalModifiedKey()` should be based on the premise that the key has existed before or that it has been modified.
How do we implement operations like 'set (test, null)'? Don't we use the del command?
**Comment From: whipfeng**
I wrote another complex example based on Redis cluster:
String getSetIfNull(final String key, final int seconds, final Supplier
Among them, the value obtained by `supplier.get()` may change. Each time it changes in another concurrency, the Redis key will be deleted.
After ` if (nVer.equals(oVer))`(true), the key just expired and was deleted several times in another concurrency. At this point, an old value will be set successfully, although the probability is small.
So how can I guarantee that the value is the last change? Please help me to have a careful look!
**Comment From: whipfeng**
I still think that every modification operation should imply the concept of a version number.
The concept given is accurate, otherwise the premise that the key already exists should be excluded separately.
But it[#9194] seems that doing this only for expired keys can solve my problem, and I don't have a better proof example
**Comment From: sundb**
> How do we implement operations like 'set (test, null)'? Don't we use the del command?
I'm not sure what you are trying to do with this code, is it to delete the `test` key?
**Comment From: whipfeng**
> > How do we implement operations like 'set (test, null)'? Don't we use the del command?
>
> I'm not sure what you are trying to do with this code, is it to delete the `test` key?
@sundb @jerrykcode I wrote another short example to support my idea:
String getSetIfNull(final String key, final Supplier
j85.watch(key);
val = j85.get(key);
if (null != val) {// During the watch process, you need to query and judge again
j85.unwatch();
return val;
}
val = supplier.get();//Load data from the business. The key will be deleted when the data is changed concurrently
/**
* At this time, other concurrent users may modify the value of 'supplier.get()' multiple times
* and delete the key(However, deletion does not affect the setting of expired data. Is that OK?It seems that only
* handling expired keys can't solve this problem)
* */
Transaction trans = j85.multi();
trans.set(key, val);
trans.exec();
j85.close();
return val;
}
The goal is Redis cache and `supplier.get()` final consistency.
Please see the content of code comments, thank you!
**Comment From: sundb**
@whipfeng Not sure i can follow you.
First of all, the `getSetIfNull()` method itself is not thread-safe, and operations on redis are not atomic, so you cannot guarantee that the Redis cachet and `supplier.get()` are consistent.
1) Make sure that supplier.get() is locked for the entire time when it's saved to Redis.
2) Make sure that the key cannot be modified by other concurrently, you may need a distributed lock or use lua or something else.
**Comment From: whipfeng**
@sundb What is the basis of `not thread-safe`?
If you follow the current implementation of the del command, it is true.
Let me change my example. Do you think we can get this conclusion?
String getSetIfNull(final String key, final Supplier
j85.watch(key);
val = j85.get(key);
if ('null_str' != val) {// During the watch process, you need to query and judge again
j85.unwatch();
return val;
}
val = supplier.get();//Load data from the business(Not allowed as 'null_str'). The key will be set to 'null_str' when the data is changed concurrently
/**
* At this time, other concurrent users may modify the value of 'supplier.get()' multiple times
* and set the key to 'null_str'
* */
Transaction trans = j85.multi();
trans.set(key, val);
trans.exec();
j85.close();
return val;
}
Among them, 'null_str' is considered as a reserved word in business and `supplier.get()` is not allowed to return 'null_str'.
**Comment From: sundb**
@whipfeng
1)
```java
String val = j85.get(key);
if ('null_str' != val) { // Is it possible for the key to be modified by other concurrently at this time?
return val;
}
2)
val = j85.get(key);
if ('null_str' != val) { // Same as above, key was deleted or changed by other concurrently.
j85.unwatch();
return val;
}
3)
val = supplier.get(); // After get(), is it possible for the supplier to be modified by other concurrently?
Any one of the above 3 types of inconsistencies due to concurrency may cause Redis cache and supplier.get() to end up inconsistent.
Comment From: whipfeng
@sundb This transient situation is no problem. What we want is final consistency. For those who use this method, they can finally perceive that the data has changed. And my example only disproves the problem of del command in the previous example.
Comment From: sundb
@whipfeng Not sure i fully understand what you meant. You mean there won't be two concurrent or 2 threads calling this method at the same time? If not, then there must be a concurrency or threading problem.
like the following example(thread1 and thread2)
thread1:
1) change supplier to s1, then call getSetIfNull()
2) run to `val = supplier.get();` and set val to s1
thread2:
1) change supplier to s2, then call getSetIfNull()
2) run to `val = supplier.get();` and set val to s2
3) update the key in redis with s2.
thread1:
3) update the key in redis with s1.
In fact, the value of the key should be s2, not s1. In any case, as long as there are concurrent reads and writes to the same resource, there is bound to be a race to the bottom, which is why locks are needed.
Comment From: whipfeng
@sundb The concurrency relationship is not so complex,Equivalent to asynchronous cache refresh
thread0:
1) change supplier to s1
2) set(key,'null_str')
thread1~N: //Just take it out for use
1) call getSetIfNull()
You can think that supplier.get() is loaded from the database. thread0: change supplier to s1 is an operation to modify the database
Comment From: sundb
So you are now running into a final inconsistency problem that the database and cache are not consistent?
Comment From: whipfeng
The last example I provided does not have this problem. The previous one will have this problem by using the del command. This is also the reason why I give two examples to disprove.
Comment From: sundb
it semms that if you use DEL j85.get(key) would never equal null_str.
I'm not expert with JAVA, perhaps you can change if ('null_str' != val) to if (NULL != val) when using DEL command.
Comment From: whipfeng
set(key,'null_str')and del(key) are just two examples for comparison,I just want to say that if there is a problem with the way of del(key)
Comment From: sundb
There will be no problem.
Comment From: whipfeng
Let's look at the original example
String getSetIfNull(final String key, final Supplier<String> supplier) {
Jedis j85 = new Jedis("xx.xx.xx.xx",0000);
String val = j85.get(key);
if (null != val) {// Direct use of cache with value
return val;
}
j85.watch(key);
val = j85.get(key);
if (null != val) {// During the watch process, you need to query and judge again
j85.unwatch();
return val;
}
val = supplier.get();//Load data from the business. The key will be deleted when the data is changed concurrently
/**
* At this time, other concurrent users may modify the value of 'supplier.get()' multiple times
* and delete the key(However, deletion does not affect the setting of expired data. Is that OK?It seems that only
* handling expired keys can't solve this problem)
* */
Transaction trans = j85.multi();
trans.set(key, val);
trans.exec();
j85.close();
return val;
}
This is the concurrency relationship
thread0:
1) change supplier to s1
2) del(key)
thread1~N: //Just take it out for use
1) call getSetIfNull()
In this case, final consistency cannot be guaranteed. Is that the problem?
Comment From: sundb
It seems to me that, as in your example above, there are concurrency issues. Because you are calling the set command in both threads.
thread0:
1) change supplier to s1
2) set(key,'null_str')
thread1~N: //Just take it out for use
1) call getSetIfNull()
If it were me, I would use a lock.
thread0:
lock(lockname) {
1) change supplier to s1
2) set(key,'null_str')
}
thread1~N: //Just take it out for use
lock(lockname) {
1) call getSetIfNull()
}
Comment From: whipfeng
We only need final consistency,And it's enough for us to take the atomic operation of set(key,'null_str') as the boundary point of consistency.It is too heavy to use locks. And locks in distributed environments are inherently 'consistency issues'.
We can only discuss the final consistency, because even if you use the lock, thread1~N will be inconsistent in a short period of time after the system scheduling delay.
Comment From: sundb
If you don't wanna use locks, then you have to protect your operations from atomicity. Why would you want to do a del instead of updating the cache directly? Like following:
updateCache() {
getSupplier();
multi();
updateSupplier();
exec();
}
getSetIfNull() {
String val = j85.get(key);
if (null != val) {// Direct use of cache with value
return val;
}
updateCache();
}
This ensures that the operation to cache would be atomic.
Comment From: whipfeng
updateSupplier() In the current thread, I don't understand.
I guess that's what you want
updateCache() {
val = getSupplier();
multi();
set(key,val);
exec();
}
There are still consistency problems.
If there is concurrent behavior before getSupplier()to multi(), and the result of getSupplier() is changing, the last setting to the cache is an uncertain value
Comment From: sundb
Both supplier and cache are read and written by multiple threads at the same time, so locks are unavoidable. Or you can make sure that only one thread can do the update, and that the other threads are read-only.
Comment From: whipfeng
1)The watch command of Redis has the concept of optimistic locking,If you put another lock outside, it's meaningless
2)only one thread can do the update,Similar to this,The update is serial, but multiple updates may be fast.
Comment From: sundb
1)The watch command of Redis has the concept of optimistic locking,If you put another lock outside, it's meaningless
But competing resources also include suppler, how do you ensure that supplier.get() has not been modified by other threads?
Comment From: whipfeng
After supplier.get() is modified by another thread, the cache will be deleted by that thread, and subsequent calls to getSetIfNull() will reload the cache, which is why we want to use the watch command.
In addition,short-time inconsistency in distributed scenarios is tolerable, so only final consistency is generally discussed. If you have to add locks on the outer layer, it is unrealistic for all callers from different businesses to use unified distributed locks,and this is based on the ideal situation of distributed locks(which is an issue in itself).
Take the concurrency relationship as an example:
thread0:
lock(lockname) {
1) change supplier to s1 //'s1' is a variable
2) del(key)
}
thread1~N: //thread1~threadN can come from different processes or distributed nodes
lock(lockname) {
1) val = call getSetIfNull()
2) Business use 'val'
}
3) Business use 'val'
According to you, only 2) Business use 'val'can guarantee absolute consistency. If it is used outside like 3) Business use 'val', there will still be the same problem.
We just need to ensure that the reading after writing is up to date in chronological order.For the caller, the completion of writing is the end of 'thread0' and the start of reading is the start of 'thread1~N'.
Comment From: sundb
A second thought, The code you mentioned in https://github.com/redis/redis/issues/11704#issuecomment-1413363391 looks like it will guarantee final consistency, but I can't finger it out yet, maybe you can provide a more detailed example. On the other hand, you can use the monitor command to verify the process of updating the redis cache.
Comment From: whipfeng
The following is printed by me with the monitor command, and the following comments are printed by the thread I debug
1675399184.811923 [0 10.11.63.39:64804] "DEL" "testbit" //thread0:supplier 0
1675399187.042392 [0 10.11.63.39:64806] "DEL" "testbit" //thread0:supplier 01
1675399189.030541 [0 10.11.63.39:64807] "DEL" "testbit" //thread0:supplier 012
1675399207.603029 [0 10.11.63.39:64802] "GET" "testbit" //thread1:value=null
1675399211.691565 [0 10.11.63.39:64802] "WATCH" "testbit" //thread1:
1675399222.534349 [0 10.11.63.39:64802] "GET" "testbit" //thread1:
1675399233.383902 [0 10.11.63.39:64810] "DEL" "testbit" //thread0:supplier 0123
1675399237.476089 [0 10.11.63.39:64811] "DEL" "testbit" //thread0:supplier 01234
//thread1:call 'supplier.get()'
1675399268.017141 [0 10.11.63.39:64814] "DEL" "testbit" //thread0:supplier 012345
1675399269.577149 [0 10.11.63.39:64815] "DEL" "testbit" //thread0:supplier 0123456
1675399271.148512 [0 10.11.63.39:64816] "DEL" "testbit" //thread0:supplier 01234567
1675399281.877108 [0 10.11.63.39:64802] "MULTI"。 //thread1:
1675399281.877210 [0 10.11.63.39:64802] "SET" "testbit" "01234" //thread1:
1675399281.877383 [0 10.11.63.39:64802] "EXEC" //thread1:getSetIfNull 01234
1675399289.646186 [0 10.11.63.39:64817] "GET" "testbit" //thread1:value=01234
//thread1:getSetIfNull 01234
1675399295.299110 [0 10.11.63.39:64818] "GET" "testbit" //thread1:value=01234
//thread1:getSetIfNull 01234
Comment From: sundb
This is where the cache ends up being inconsistent due to supplier being read and written by multiple threads.
WATCH command doesn't work in this case because DEL a non-existent key will not trigger WATCH.
If you don't use locks, then you may need a more complex update caching strategy, such as a cache failure queue or others.
thread1: watch key
thread1: get supplier 01234
thread2: update supplie to 01234567
thread1: update cache with 01234
Comment From: whipfeng
WATCH command doesn't work in this case because DEL a non-existent key will not trigger WATCH
This is the problem we have been discussing!
As @jerrykcode https://github.com/redis/redis/issues/11704#issuecomment-1383690182 said: so should we change the feature to let the del command also triggers signalModifiedKey even when del failed (nothing in the database altered) ?
So I gave two examples[del(key) and set(key, 'null_str')] to support my idea.
And if we change this feature, it will also simplify the repair logic in https://github.com/redis/redis/pull/9194