Redis Client Side Caching and TTL interaction

tl;dr: It seems enabling Client Side Caching without giving thought to its interaction with keys TTL expiration can result in: 1. Behavior that is critically different between current non CSC enabled client and future enabled one. 1. This behavior is non intuitive / borderline broken as far as the client expectation.

The basic issue: Consider a client that reads every second a key set to expire within 5 sec till it expires. - Without CSC The client will get 5 successful reads of the key, upon the 6th attempt the key is expired on server and the client gets the appropriate missing key response. - With CSC Upon first read, the client will save a cached value of key. From the second read and on the client will serve that locally cached value. After 5 second despite the fact the TTL has expired, the client will continue to serve the cached value. Its only when the server activeExpireCycle that key has a statistical chance to be expired which will send invalidate massage to the client. This will make the client erase the local cached copy and finally start returning missing key.

In other words: There is a non negotiable, and non deterministic period of time were a CSC enabled will return a different result than the non CSC enabled. Furthermore, the CSC enabled result explicitly violates the TTL set for the key.

For the issue to be fixed the client must set a local TTL value. To do so we can suggest some options: 1. Regardless of "real" TTL value in the server set some sane/aggressive local TTL value. 1. After each GET the client will also issue TTL and save a local cache of the key with the correct "real" TTL. 1. Extend GET command to have an optional TTL field. The optional TTL field will be filled for CSC enabled clients GET-ing a key that has TTL 1. Implement a new push massage (like invalidate) that informs of TTL. TTL messages will be pushed to CSC enabled clients that recently "GET" a key that has TTL. (It can also be sent to keys for which only the TTL was changed the command "EXPIRE" instead of "invalidate". 1. Implement a new "GET_TRACKED_KEY" that will return both value and TTL.
An added bonus of such new GET command is that we can remove the "client tracking on/off". A client using GET_TRACKED_KEY will track just that key giving much more power/granularity for clients/application to optimize their usage of the new CSC feature.

minor finalle: Assuming we have implemented any of the above. I can further claim that sending invalidate message up on key expiration is not needed. At best it will serve as keys memory management helper for the client. (at worse it will waste network and CPU resources)

Comment From: yossigo

Hi @eliblight,

The way I see it, the biggest thing that breaks here is that clients can no longer assume they'll never receive an expired key.

The thing is I can't think of any reliable way to address that without making the client informed of the TTL (or a derivative of it) and do expire logic itself. This introduces a whole new set of issues, such as:

The need to relay TTL (or derived) information to the client; Remember it's not just GET but basically any R/O command. That may require some RESP3 magic...
We'll depend on many different client implementations to be complete and correct. Pushing complex logic to the client may force us to also provide a common library implementation for it.
Lots of potential consistency issues due to clock drifts, modification of TTL on the server side, etc. Some of these may be inevitable in any client side caching scenario, but I think the surface area for issues gets bigger.

A question to @antirez, why doesn't expireIfNeeded() invalidate the key just like active expire or other key modifying commands? It will not solve the root problem here but may greatly reduce the symptoms in many cases.

Comment From: tomer-w

@yossigo, I agree with everything you said but not letting the client handle TTL is not any better than to do. Applications depends on proper TTL and CSC should not break that.

Comment From: antirez

So... I think I don't agree with this issue for different reasons. To start I'll reply to @yossigo: if we don't send the invalidation on expireIfNeeded(), this is just a bug, probably I thought that it would be sent as a side effect of deleting a key when we perform signalModifiedKey(), are you sure this is not happening? I'll try, but anyway that would just be a bug.

Then the reasons why I don't agree with @eliblight about all this:

In general CSC violates certain consistency guarantees, because the tracking channel may have a different delay compared to the data channel. This delta can be reduced however, by continuously pinging the Pub/Sub channel (or RESP3 connection) in order to detect if the invalidations channel is broken, and in that case, flushing all the local client side data (that is a very bold action). However there is no general solution for that, it's a tradeoff.
The deletion notification that we get with CSC for keys with a TTL, can be considered a "best effort" mechanism, we can document that clients that want more precise TTL expiration should also get the TTL. Moreover if we don't do it already, we should make sure to send an invalidation message also when the TTL is changed from a key.
We need to warn the client authors that in general when implementing this protocol, they should set a reasonable TTL for the data anyway, this is already outlined in the draft CSC documentation that we have online.
The expire mechanism is more likely to evolve in the future and become more precise. Work was already in progress but didn't reach maturity for Redis 6, but in the future we'll very probably end with precise TTLs.

About your point "GET_TRACKED_KEY", not sure if you are aware, but in the CSC specification before Redis 6 gets GA, there is already such a mode where you specify with a command (after enabling tracking), only the keys you want to track, so this is not needed from the POV of fine grained tracking.

This is what I read in the current design document I've here:

CLIENT TRACKING on OPTIN
CACHING (yes/no?)
GET foo

Basically you send CACHING yes if you want to cache the next value, it flags the client and will track the keys returned in the next script / transaction / command. Note that this is much better from the POV of race conditions compared to saying later "I'm caching xyz".

I'm quite skeptical about doing any change like that. Of this whole proposal, I agree only on a single point, that is that a GET option to return the PTTL of a key could be nice to have in general and even more now.

Comment From: antirez

Please make sure to read this: https://github.com/antirez/redis/issues/6867

Comment From: itamarhaber

WITHTTL solves the problem with GETs alone... what if I want to CSC GETRANGE, or even ZRANGE?

Comment From: antirez

@itamarhaber this is just a so common case worth optimizing, for all the rest there is MULTI/TTL/CMD/EXEC.

Comment From: antirez

@yossigo, I agree with everything you said but not letting the client handle TTL is not any better than to do. Applications depends on proper TTL and CSC should not break that.

CSC does not break it, clients that want to cache keys with a TTL can tag keys with the TTL and expire them upon access. Apps where this is not critical will just set a fixed max TTL for cached value, and wait for invalidation messages from Redis.

Comment From: antirez

@itamarhaber note that in RESP3 we also have attributes, so later when a client is in tracking mode, if we set a special option like "GETTTL", we can have the attribute for the key ttl returned as we fetch things. But this is an incremental development that we can add if needed, no need to over-design this now.

Comment From: eliblight

If I read this correctly, I see 3 almost non related modes of CSC - BCAST on/off - implicitly track all keys - tracking on/off - implicitly track all the keys I get.

client tracking on
GET k1
GET k2
...
client tracking off

CACHING yes/no - explicitly track only specific keys

client tracking OPTIN on
CACHING yes
GET k1
CACHING yes
GET k2
...
client tracking off

So... for the last option I would ask: why do we need the client tracking on/off at the start and end? in other words: why does the server needs to pre-warning / pre-registration before serving a would be cached key? can't all clients be in CLIENT OPTIN on by default? would a :

CACHING yes
GET k1
CACHING yes
GET k2
...

be enough?

actually the "yes" also becomes redundant, if you remove the explicit registration and deregistration because there is no need for CACHING no. so the actual suggestion is:

CACHING
GET k1
CACHING
GET k2
...

... And looping back to my original suggestion. If you add another command to the API anyway may I suggest that instead of CACHING will implement a CACHING_GET:

CACHING_GET k1
CACHING_GET k2

is this cleaner? It provides the same functionality with half the commands (and their associated overhead)

AND... If so, we can explicitly have the value and the TTL returned on this specific new CACHING_GET command and thus give clients the option to have "eventual consistency" with the server.

i.e. clients that will use the first two options are clients that want caching but accurate TTL is not super important, and CACHING_GET will be used by client that need local caching and accurate local TTL.

Comment From: antirez

Hello @eliblight there was some misunderstanding here, this is how it works, when you set Caching to yes it is auto cleared after the next call or transaction executed. You need yes and no in order to also support OPTOUT version.

Comment From: eliblight

@antirez I do get the auto cleared characteristics of the CACHING command, and I do get the there is the "CLIENT TRACKING OPTIN on" + "CACHING yes" VS the "CLIENT TRACKING OPTOUT on" + "CACHING no" combo.

so here is a "CLIENT TRACKING OPTOUT on" + "CACHING no" scenario:

GET too_early # was not CSC because was before tracking on comand
CLIENT TRACKING OPTOUT on 
GET c1
GET c2
CACHING no
GET nc3 # not tracked because "caching no" was issued before 
GET c4
CACHING no
GET nc5 # not tracked because "caching no" was issued before 
CACHING no
GET nc6 # not tracked because "caching no" was issued before 
CLIENT TRACKING off
GET too_late # not tracked because tacking was turned off

And here is the the "CLIENT TRACKING OPTIN on" + "CACHING yes" of the same scenario:

GET too_early # was not CSC because was before tracking on comand
CLIENT TRACKING OPTIN on 
CACHING yes 
GET c1 # tracked because "caching yes" was issued before
CACHING yes 
GET c2 # tracked because "caching yes" was issued before
GET nc3
CACHING yes 
GET c4 # tracked because "caching yes" was issued before
GET nc5
GET nc6
CLIENT TRACKING off
GET too_late # not tracked because tacking was turned off

In both cases only c1, c2, and c4 are tracked.

now consider this alternative

GET too_early
GET c1 EX T # request expiration with value and tracking 
GET c2 T # request tracking only
GET nc3 EX # request expiration with value
GET c4 EX T # request expiration with value and tracking
GET nc5
GET nc6
GET too_late

Again only c1, c2, and c4 were tracked. Also express the need for TTL on c1, nc3, and c4.

p.s. I borrowed the EX from the SET command, but we can call it TTL or something

WDYT?

Comment From: yossigo

I think the auto-clearing property of the CACHING command is a bit confusing, as it introduces a new paradigm which is not used elsewhere. Then there's the matter of efficiency when batch fetching keys.

The GET flags proposed by @eliblight solve this and make things more explicit, but it focuses only on strings and ignores other data types.

What if we leverage the existing MULTI state for that purpose? i.e. the opt-in (or opt-out) would be an optional MULTI flag which will affect everything in it.

Comment From: antirez

Caching uses a pattern already used in Redis cluster ASKING command and in other places too, so people should have no issues with it and it is general, does not require any additional modification to commands, no round trip more since you can always pipeline it, and so forth.

On Sun, Feb 9, 2020, 07:11 Yossi Gottlieb notifications@github.com wrote:

I think the auto-clearing property of the CACHING command is a bit confusing, as it introduces a new paradigm which is not used elsewhere. Then there's the matter of efficiency when batch fetching keys.

The GET flags proposed by @eliblight https://github.com/eliblight solve this and make things more explicit, but it focuses only on strings and ignores other data types.

What if we leverage the existing MULTI state for that purpose? i.e. the opt-in (or opt-out) would be an optional MULTI flag which will affect everything in it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antirez/redis/issues/6833?email_source=notifications&email_token=AAAQAYHQ6HPW2D2ICB7DZWLRB6NBLA5CNFSM4KQLUKFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELGDOJI#issuecomment-583808805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQAYAJRFTRVGTFHURDQGLRB6NBLANCNFSM4KQLUKFA .

Comment From: antirez

@eliblight what you suggest is an anti-pattern modifying just single commands. Redis would be a jungle if we went that way :-) I think you have to do some effort to detach yourself from the problem you are solving for your customers and think to everybody in the community, I understand it is very tempting to solve the problem you have at hand in the most efficient way, but in that way we end with a very good hammer that is efficient only with certain kinds of nails. Later we can say too: GET is so important that we want to specialize it in a similar way to what you suggest, but the first step is to have an orthogonal feature that works for every command. Also note that in practice, there is basically no difference: CACHING is in a pipeline and is not going to take more TCP packets in the average case. The same for the reply. So it only looks more expansive, but is actually only more flexible.

Btw I'm interested in getting feedbacks regarding the "csc2" branch.

Comment From: eliblight

@eliblight what you suggest is an anti-pattern modifying just single commands. Redis would be a jungle if we went that way :-)

Now that you phrase it like this I am bound to agree... a single command modification will simply not do... (what are we farmers?)

I think you have to do some effort to detach yourself from the problem you are solving for your customers and think to everybody in the community, I understand it is very tempting to solve the problem you have at hand in the most efficient way, but in that way we end with a very good hammer that is efficient only with certain kinds of nails.

I like good hammer... very versatile tool (especially if it has the forked back end)... and I usually try to solve the problem at hand.... so I guess i am guilty? (p.s. I thought the customers are the community, have i got it wrong?)

Later we can say too: GET is so important that we want to specialize it in a similar way to what you suggest, but the first step is to have an orthogonal feature that works for every command. Also note that in practice, there is basically no difference: CACHING is in a pipeline and is not going to take more TCP packets in the average case. The same for the reply. So it only looks more expansive, but is actually only more flexible.

As you might have guessed by now... I am the last person to push for early optimizations (raw keys sounds familiar :) )... I truly believe optimizations are super important in very specific places and cases... and everywhere else code should be easy to read / intuitive to use / safe to maintain.

In this case i just felt its a more intuitive / less error prone API for the customer (who might and might not be the community... )

Btw I'm interested in getting feedbacks regarding the "csc2" branch.

Will do.