I sometimes see CLIENT TRACKING key invalidation messages embedded within an outer multi-bulk reply, which I believe is not supposed to happen.
Push data may be interleaved with any protocol data, but always at the top level, so the client will never find push data in the middle of a Map reply for instance.
To reproduce
Compile and run this code locally. The problem doesn't occur every run but it does more often than not so it shouldn't take any time to reproduce.
The heart of the logic that triggers the problem is as follows:
/* Set some fast expiring keys and immetiately request them back */
execAndDumpCommand(c, r, 0, "PSETEX %s %d %s", "key:0", 10, "A");
execAndDumpCommand(c, r, 0, "GETBIT %s %d", "key:0", 1);
execAndDumpCommand(c, r, 0, "PSETEX %s %d %s", "key:1", 10, "A");
execAndDumpCommand(c, r, 0, "GETBIT %s %d", "key:1", 1);
execAndDumpCommand(c, r, 0, "PSETEX %s %d %s", "key:2", 10, "A");
execAndDumpCommand(c, r, 0, "GETBIT %s %d", "key:2", 1);
/* Invoke MGET against the keys with a short delay between calls.
* At some point Redis will fire the expiration invalidation which
* sometimes is embedded inside the MGET multi-bulk reply. */
execAndDumpCommand(c, r, 1, "MGET %s", "key:0");
usleep(10000);
execAndDumpCommand(c, r, 1, "MGET %s", "key:1");
usleep(10000);
execAndDumpCommand(c, r, 1, "MGET %s", "key:2");
usleep(10000);
Just to make sure I wasn't going crazy I also reproduced the behavior in a docker container.
Expected behavior
Push messages should always exist as top-level replies.
Additional information
redis-cli info tcpflow data showing the raw replies
As always I'm happy to help test any fixes, etc
Cheers!
Comment From: huangzhw
When execute MGET, it uses lookupKeyRead and finds the key is expired, so signalModifiedKey is called, replies are interleaved.
Comment From: huangzhw
@yossigo @oranagra This is something like https://github.com/redis/redis/issues/8206. But I think this one is more severe. For multi key command, the invalidate messeages are interleaved with command reply.
Comment From: madolson
@huangzhw I believe the solution that Yossi called out, "scheduleSignalModifiedKey" would still work in this case. This seems worth trying to solve now.
Comment From: huangzhw
@madolson Yes. If at first we doubt whether we need it, now this is needed.
Comment From: yossigo
I agree, it's definitely broken bad enough to justify a fix now.
Comment From: nmvk
Taking a look
Comment From: oranagra
@nmvk please also follow #8206 so we don't step on each other's toes.
Comment From: michael-grunder
Hi everyone :wave:
I believe I have found a different edge case that can trigger this issue. As far as I can tell the issue still exists in 7.0.4, 6.2.7, and unstable.
Should be easy to reproduce. Simply enable RESP3, client tracking, and then execute a FLUSHDB or FLUSHALL inside of a MULTI..EXEC block.
❯ redis-cli info|grep git
redis_git_sha1:39d216a3
redis_git_dirty:0
❯ redis-cli -3
127.0.0.1:6379> client tracking on
OK
127.0.0.1:6379> multi
OK
127.0.0.1:6379(TX)> flushdb
QUEUED
127.0.0.1:6379(TX)> exec
1) 1) "invalidate"
2) (nil)
tcpflow shows the invalidation message embedded in the *1 EXEC reply:
127.000.000.001.06379-127.000.000.001.51228: +OK # CLIENT TRACKING
127.000.000.001.06379-127.000.000.001.51228: +OK # MULTI
127.000.000.001.06379-127.000.000.001.51228: +QUEUED # FLUSHDB
127.000.000.001.06379-127.000.000.001.51228: *1 # EXEC
>2
$10
invalidate
_
+OK
Comment From: huangzhw
We missed trackingInvalidateKeysOnFlush.
Comment From: huangzhw
I made https://github.com/redis/redis/pull/11038 to solve it.