Redis [NEW] Improve RM_Call performance

The problem/use-case that the feature addresses

with the addition of https://github.com/redis/redis/pull/11158 the performance of RM_Call becomes more important (you really want minimal overhead on a dry run call) but in practice this impacts general usage of RM_Call as well.

From observation, it seems that there are 2 main areas of overhead in relation to executing a dry run and an RM_Call in general and they are both due to allocations that have to take place.

i.e. in

moduleAllocTempClient() and moduleCreateArgvFromUserFormat()

one can avoid the overhead of moduleAllocTempClient() by pushing it later in the function, to after the dry run would complete.

However, the overhead of moduleCreateArgvFromUserFormat() is a bit harder to overcome due to the fact that it separates out the "cmd" from the arguments passed to it and then has to do multiple allocations / copies to combine them together.

if one takes a simple example

RM_Call(ctx, cmd, "v", args, arg_count);

which would then call

argv = moduleCreateArgvFromUserFormat(cmd,fmt,&argc,&argv_len,&flags,ap)

one might think, that since we are just passing in a vector of robj's why not just return the same vector (with just populating argv_len). However, as noted, since cmd is separate we have to do multiple allocations and copies.

A simple way to perhaps improve this is to have a new "V" fmt type. which would assert 2 things

1) one can only have one V format type and it has to be before any other format types 2) if one specifies the V format type, cmd has to be NULL (as we will be using args[0] as cmd.

Then, we don't have to reallocate/copy anything, all we do is

1) allocate argv_len 2) incrememnt robj counter on each element 3) assign argv_len[pos] value.

and return the original array as the argv string.

For implementation, we would simply loop over the fmt string up front) (before handling the va args) to see if its a "simple case" or a "complicated case" (simple being only a single V for the args, which would tell us we can avoid allocations/reallocations)

This would also improve the usability of RM_Call for those that use the module filter interface to intercept commands.

i.e. what they probably end up doing is something like

size_t cmdlen;
const char *cmd = RedisModule_StringPtrLen(c->argv[0], &cmdlen);
reply = RedisModule_Call(ctx, cmd, "v", &c->argv[1], c->argc - 1);

With all that said, there are perhaps better ways to improve API ergonomics and performance that I haven't considered. when this could be called simpler with the above proposa as

reply = RedisModule_Call(ctx, NULL, "V", c->argv, c->argc);

Comment From: oranagra

@guybe7 @MeirShpilraien @zuiderkwast maybe you can think of a better solution

Comment From: zuiderkwast

If the "V" format idea is simple to implement and doesn't add complexity to other parts of the code, then why not give it a try? I'd like to see a PR before I decide whether I like it or not.

Comment From: MeirShpilraien

I believe that using an array of RedisModuleString* or char* are the most common way of using RM_Call. I suggested creating 2 new API's:

RM_CallRawRedisModuleString(int flags, RedisModuleString* argv, size_t argc) - will executing the command on argv
RM_CallRawCString(int flags, char** argv, size_t argc) - will executing the command on argv

Notice that is this suggestion we will even improve the performance of parsing the format flag (OOM, NO_WRITES, ...) because we will pass them directly as flags.

Comment From: oranagra

I don't feel comfortable with so many alternative interfaces. one taking flags as string, the other as integer. one taking command name as a separate arg, and the other as part of the array. i could maybe live with one RM_CallRaw that takes an explicit argv array, but note re-invent the flags. however, considering this is just a performance issue, i wonder what impact it has and if we can't optimize without changing the interface.

Comment From: tezc

Flamegraph from a quick test:

**A bit more details for the top 3 paths:****

Screenshot from 2022-08-28 21-57-14

Note: Inside call(), hdr_record_values() seems useless for module clients (just had a quick look, maybe I'm wrong). Leaving this aside for now.

I guess we should focus improving these two: 1- Allocations inside moduleCreateArgvFromUserFormat() 2- Allocating/freeing temp client. (resetClient() frees argv that we allocate inside moduleCreateArgvFromUserFormat()).

Just some ideas: - Maybe it is possible to defer allocating temp client just before we call call(). This is going to help dry-run only. - If V fmt or a new API function gets RedisModuleString** (cmd as the first one in the array), then I guess we don't need to allocate anything (not even argv array itself).
- Drawback: We can modify the array itself inside RM_Call(). e.g module filters can add something to the array. User should be okay with that.