Redis A better approach for COMMAND INFO for movablekeys commands

The problem

Reading the documentation:

https://redis.io/commands/command#movable-keys https://redis.io/commands/command-getkeys

There is no way today to cache key indexes for commands such as ZUNIONSTORE/EVAL others with "numkeys", since COMMAND INFO returns 0 info for them.

This requires to 'patch' the code for those commands, or to disable caching and resolve each such command's keys with COMMAND GETKEYS if you are planning on a cluster aware redis client.

The solution

All commands that have "numkeys" indexing (most / all of movable keys), should instead of this output:

127.0.0.1:6379> command info eval
1) 1) "eval"
   2) (integer) -3
   3) 1) noscript
      2) movablekeys
   4) (integer) 0
   5) (integer) 0
   6) (integer) 0
   7) 1) @slow
      2) @scripting

Produce this output:

127.0.0.1:6379> command info eval
1) 1) "eval"
   2) (integer) -3
   3) 1) noscript
      2) movablekeys
   4) (integer) 3
   5) (integer) -2
   6) (integer) 1
   7) 1) @slow
      2) @scripting

The main change here is that argument #5 (Last key in argument list) meaning under "movablekeys" is different:

https://redis.io/commands/command#last-key-in-argument-list

It's negative so that people who ignore reading the documentation, will not use it by accident, and it absolute value points to index which holds the numkeys.

Also it's non-conflicting with -1 meaning in the regular case, because numkeys means it's not infinite number of keys, just that index 1 holds how many keys there are.

This will make the clients be able to cache this information for cluster key usage.

Ofcourse a breaking change (such as ACL did) to the output can just add another variable that means (-1 no such numkeys index, or for the numkeys index) instead.

Comment From: yossigo

@tzickel Unfortunately I don't see how this can be done in a complete manner, as theoretically commands with movable keys have a lot of freedom regarding their positions and count. Even if we did hard code that for Redis commands, we'd still be left with the problem of modules with movable keys.

Comment From: tzickel

I think that today's situation is bad and can be improved, most libraries hard code this information which sucks (and then when a new function comes out they need to figure it out to get it supported and release an update, if at all).

Redis server should provide library developers as much information to handle all the stuff it asks of them to support.

In my opinion, functions that can support it (i.e. all of the bulitins today and some of the modules?) should support it in one way or the other (as outlined above, maybe a new index in command info?), and only functions who specifically can't should opt out.

Comment From: yossigo

@tzickel I guess that what you're proposing is basically this: leave the custom getkeys mechanism (movable keys) in place but add additional declarative schemes for commands in addition to the current first/last/step scheme. For example, a num-keys/first-key scheme that could be used by EVAL and others.

I'd be happy to get the perspective on this from other client library maintainers. @gkorland What do you think?

Comment From: oranagra

FYI: This was recently discussed here: https://github.com/redis/redis/pull/7794 AFAIR not all the movablekeys command use this numkeys scheme. One extremely bad example IMHO is the STRALGO command (which i wish would have been broken to some 4 different commands rather than one).

I think that for commands like STRALGO there are only two options, either the client library implements specific support for it, or it uses the GETKEYS command.

I do agree that we should make it easier for client libraries to support the common schemes (like EVAL), however i don't think that re-purposing the lastkey argument is right, and i don't think it's safe to assume that the very next argument after the numkeys argument is the first key. for instance the ZUNIONSTORE have a separate storekey, and only later the numkeys and the input keys, and in theory these may be spaced apart (unlike ZUNION and EVAL).

From what i've seen, some client libraries just need one key (in order to hash it and decide which cluster node to send the command to), so for that purpose they can just use the storekey and don't even need to look into numkeys etc. but i think we wanna build something that can be used to discover all keys, and not just one.

I think we should emit more fields in COMMAND INFO to reflect the more complicated schemes rather than re-purpose existing fields.

Comment From: tzickel

In that PR numkeys is used as this.

And yes, the end goal for this specific request is for cluster routing (i.e. just tell me sufficient information to where I can find any key without asking the server each invocation).

It would be nice if there was a guide on how to properly design commands arguments to be friendly for clients (maybe for other goals instead of just cluster, like knowing if a command can be used in a PIPELINE, can cause BLOCKING, etc... although ACL has improved that), for example XREAD seems to be designed poorly in this regards.

In STRALGO, maybe it's possible to consider STRALGO LCS KEYS as the command (other invocations don't require knowing the key). In ZUNIONSTORE it's easy, the first index is always the first keys, which is enough for cluster routing.

Comment From: trevor211

I think it is not easy to design for every existing commands keeping the purpose of existing fields as they are.

In order to provide a way for clients to cache, we can only use COMMAND INFO instead of COMMAND GETKEYS. Based on current design, we can add an additional field for the output of COMMAND INFO, e.g. numkeys. We could provide a get_command_info_proc handle for each command just like getkeys_proc.

We can handle most of the existing builtin command, and we could even handle most of module command provided that modules got updated using the new mechanism.

We can not handle every command, one example is ZUNIONSTORE. Looking at the command format, FYR https://redis.io/commands/zunionstore, ZUNIONSTORE destination numkeys key [key ...] [WEIGHTS weight [weight ...]] [AGGREGATE SUM|MIN|MAX]. I think we should emit the destination, and key [key ...] parts, but that's not possible even with the added output of numkeys.

One thing I think we should be sure is that the existing fields of the output of COMMAND INFO(first key position, last key position, step count and so on) should be kept as they are.

IMHO re-purpose existing fields is not a good idea. Besides that, do you have any other ideas? @tzickel

Comment From: oranagra

maybe the response should be an array possibly holding more than one key-range spec.

ZUNIONSTORE could have something like this:

1) 1) "zunionstore"
   2) (integer) -4
   3) 1) write
      2) denyoom
      3) movablekeys
   4) (integer) 0
   5) (integer) 0
   6) (integer) 0
   7) 1) @write
      2) @sortedset
      3) @slow
   8) 1) 1) range      <- first spec of type "range"
         2) 1) write   <- flags specifying this is used for write
         3) 1) 1       <- index of first key
            2) 1       <- index of last key
            3) 1       <- key step
      2) 1) keynum     <- second spec of type "keynum"
         2) 1) read    <- flags specifying this is used for read
         3) 1) 2       <- index of keynum
            2) 3       <- first key index
            3) 1       <- key step

for SORT STORE we can have:

1) 1) "sort"
   2) (integer) -2
   3) 1) write
      2) denyoom
      3) movablekeys
   4) (integer) 1
   5) (integer) 1
   6) (integer) 1
   7) 1) @write
      2) @set
      3) @sortedset
      4) @list
      5) @slow
      6) @dangerous
   8) 1) 1) range      <- first spec of type "range"
         2) 1) read    <- flags specifying this is used for write
         3) 1) 1       <- index of first key
            2) 1       <- index of last key
            3) 1       <- key step
      2) 1) keyword    <- first spec of type "keyword"
         2) 1) write   <- flags specifying this is used for read
         3) 1) STORE   <- keyword string
            2) 1       <- key count
            3) 1       <- key step

for XREAD we can have:

1) 1) "xread"
   2) (integer) -4
   3) 1) readonly
      2) movablekeys
   4) (integer) 1
   5) (integer) 1
   6) (integer) 1
   7) 1) @read
      2) @stream
      3) @slow
      4) @blocking
   8) 1) 1) keyword      <- first spec of type "keyword"
         2) 1) read      <- flags specifying this is used for read
         3) 1) STREAMS   <- keyword string
            2) -1        <- key count (-1 indicating till the last argument, -2 one before the last)
            3) -2        <- key step (-2 indicating only half of the remaining args)

The above may also in some way fit STRALGO LCS, but i have a feeling i rather break it into smaller commands and deprecate it.

Comment From: oranagra

I'll try to translate the above set of examples to a specification.

The 8th element of the command info array, if exists, holds an array of key specs. The array may be empty, which indicates the command doesn't take any key arguments, or may contain one or more key-specs, each one may leads to the discovery of 0 or more key arguments.

A client library that doesn't support this key-spec feature will keep using the first,last,step and movablekeys flag which will obviously remain unchanged.

A client that supports this key-specs feature need only to look at the key-specs array. If it finds an unrecognized spec, it must resort to use COMMAND GETKEYS if it wishes to get all key name arguments, but if all it needs is one key in order to know which cluster node to use, then maybe another spec (if the command has several) can supply that, and there's no need to use GETKEYS.

Each spec is an array or arguments, first one is the spec name, second one is an array of flags, and the third is an array containing details about the spec (specific meaning for each spec type) The initial flags we support are "read" and "write" indicating if the keys that this key-spec finds are used for read or for write. clients should ignore ignore any unfamiliar flags.

the initial set of key-specs that covers the entire redis command list is: - "range" - [first_idx, last_idx, step] - "keynum" - [keynum_idx, first_idx, step] - "keyword" - [keyword_str, count, step] the count argument can be negative which key position from the last argument (i.e. -1 means all remaining arguments). the step argument can be negative which indicates the keys are consecutive, but their count is 1/(-count) of the arguments found according to the count option. i.e. -2 means half of the remaining arguments. that's done in order to support XREAD

Most of the existing commands in redis are already covered by the basic "range" spec, the ones that aren't are covered by these command specs combinations: - ZUNIONSTORE / ZINTERSTORE = range + keynum - ZUNION / ZINTER = keynum - SORT = range + keyword - MIGRATE = range + keyword("keys") - EVAL / EVALSHA = keynum - GEORADIUS_RO / GEORADIUSBYMEMBER_RO = range - GEORADIUS / GEORADIUSBYMEMBER = range + keyword("store") + keyword("storedist") - XREAD / XREADGROUP = keyword("streams") - MEMORY = keyword("usage") - STRALGO = keyword("keys")

Comment From: tzickel

@oranagra I like that idea (and modules should have an API for specifying this as well).

As for the read/write flags, I think that information might be informal in commands.json as well ?

Another thought is, can we push this more and somehow get the same info we get in commands.json here about all input arguments ?

https://github.com/redis/redis-doc/blob/master/commands.json

Without the full arguments information, maybe there are some edge cases where you can use something that looks like a keyword string before as something else (if he skipped an unknown spec, or in your SORT example, what happens if in the BY pattern, the pattern is called STORE).

Comment From: oranagra

@tzickel can you describe your use of commands.json? are you in some way bundling it in your client library and use the metadata it provides for argument validation?

I agree that the keyword spec is vulnerable to some other argument carrying the keyword string. the only way i see around it is to let the client validate the executed arguments against the full command argument syntax (like the one included in commands.json, which btw i'm not sure is feasible with it's current form), but that sounds like a big burden on the client.

Maybe one way to reduce the burden for the client would have been to implement this common logic in hiredis, and have client use it to perform that service for them, but that's not applicable for some clients. Also, either way if we go that direction, it means an old client library that was released before some command was added to redis, won't be able to handle it (unlike my proposed spec approach). (i'm assuming clients can execute commands they don't explicitly support and still send them to the right shard and pass back the correct response, as i've seen in some clients).

Comment From: tzickel

Currently I have no use of it, buy it would be nice if it was possible to use it as a template for not writing a boilerplate for all the Redis functions, and auto code generation of the commands.

As for the keyword, maybe we can do, if you see the keyword twice in the input, then try any of them, if the server returns an key error, then ask the server for the key information for this specific instance (which normally should not happen).

But maybe we can focus on getting COMMAND INFO to just give us the minimal information needed for routing, so instead of trying to specify all possible key information for the command, let's try to see if for all current commands, we can return a simple scheme for getting one key index from a given command (for SORT that will be just index 1).