Describe the bug
Starting with Redis 7, the info commandstats output has changed for commands with parent and subcommands structure. The metrics are missing at the parent command level which used to be available prior to Redis 7.
To reproduce
Run commands which has parent and subcommand structure and look at the info commandstats output.
# Output from Redis 7 node
127.0.0.1:6379> cluster myid
(error) ERR This instance has cluster support disabled
127.0.0.1:6379> info commandstats
# Commandstats
cmdstat_cluster|myid:calls=1,usec=5,usec_per_call=5.00,rejected_calls=0,failed_calls=1
# Output from Redis 6 node
127.0.0.1:6381> cluster myid
(error) ERR This instance has cluster support disabled
127.0.0.1:6381> info commandstats
# Commandstats
cmdstat_cluster:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=1
Expected behavior
Command stats output should have metrics at both parent (aggregated) and sub commands level.
# Output from Redis 7 node
127.0.0.1:6379> cluster myid
(error) ERR This instance has cluster support disabled
127.0.0.1:6379> info commandstats
# Commandstats
cmdstat_cluster:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=1
cmdstat_cluster|myid:calls=1,usec=5,usec_per_call=5.00,rejected_calls=0,failed_calls=1
Comment From: hpatro
@redis/core-team Thoughts ?
Comment From: madolson
I guess I missed that this was a breaking change. I was looking at the comment from https://github.com/redis/redis/pull/9504:
Command stats now show the stats per subcommand (e.g. instead of stats just
for "config" you will have stats for "config|set", "config|get", etc.)
Which seems to imply that we will have it both at the aggregated level as well as the per command level, but that wasn't the implementation . We did list it in the potentially breaking changes.
* INFO commandstats now shows the stats per sub-command (#9504)
We internally have metrics for aggregating various commands together which was broken by this. So it's definitely a breaking change. Since we have the container object, I don't really see the reason why we introduced this breaking change here?
Comment From: oranagra
I think this was very much on purpose, and i think we did know it's a breaking change and we clearly documented it in the release notes.
We internally have metrics for aggregating various commands together which was broken by this.
what do you mean?
Since we have the container object, I don't really see the reason why we introduced this breaking change here?
you could argue that there's no harm in counting things twice, and the user (application) can deduct things and do some math. but: 1. keep in mind that it can be a little more complicated for commands that can be executed without sub-command (e.g. INFO) 2. in many cases the sub-commands are so unrelated to each other, it's a complete shame to count them together (e.g. CLIENT SETNAME and CLIENT KILL, CONFIG SET and GET).
I see some advantage in un-breaking this, i.e. old applications that just look for the parent command will still find what they're looking for, but i'm not sure changing it now after 7.0 is already out makes much sense. i do think that for any person (not software) looking at the raw INFO COMMANDSTATS, seeing them without aggregation is probably much better.
Comment From: madolson
what do you mean?
We had automation looking for the cluster field field, that info field disappeared. It's possible other folks are looking for that as well.
I see some advantage in un-breaking this, i.e. old applications that just look for the parent command will still find what they're looking for, but i'm not sure changing it now after 7.0 is already out makes much sense.
I don't think the timing is all that consequential, we can re-introduce it.
Comment From: oranagra
One (worse?) case, is that maybe some application (one that complies to multiple versions of Redis) already adapted itself to get the information from both sources (possibly with +), and now that we'll add it in the parent command we break it (again).
Comment From: madolson
I'm not convinced people adopted the new functionality all that quickly. I think it's more likely software just silently stopped working. In our case, our code was looking for the presence of an info field to know if the commands had been called at all. Since now in 7.0, this situation just looks like the "zero" call case from earlier versions. I'm fine committing to this new strategy, but it was a larger breaking change than I realized. @itamarhaber any thoughts?
Comment From: ranshid
I think that applications are capable to do the math externally. but in order to be more backword compatible I would suggest having all commands which have subcommands include the '|' even if it was executed without the subcommand. so for example:
127.0.0.1:6379> CLUSTER BUMPEPOCH
(error) ERR This instance has cluster support disabled
127.0.0.1:6379> CLUSTER BUMPEPOCH
(error) ERR This instance has cluster support disabled
127.0.0.1:6379> CLUSTER
(error) ERR wrong number of arguments for 'cluster' command
127.0.0.1:6379> info Commandstats
# Commandstats
cmdstat_cluster:calls=1,usec=0,usec_per_call=0.00,rejected_calls=2,failed_calls=1
cmdstat_cluster|:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0
cmdstat_cluster|bumpepoch:calls=1,usec=8,usec_per_call=8.00,rejected_calls=0,failed_calls=1
cmdstat_cluster|addslots:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0
Comment From: oranagra
I'm not sure i understand your suggestion.
what's cmdstat_cluster| (the second line in commandstats output)?
i think that argument about backwards compatibility was valid if it was argued 5 months ago. i'm not certain it's still valid. also, i'd argue that if we were to design Redis 1.0.0 today, i don't think we would be summing things in that table, or include redundant information.
anyway, i might be in a minority opinion, and like other recent fixes, i agree it could be that not many already adopted to this change and that it's possible we'll want to re-break it quickly (we did that for other things). @redis/core-team please share your thoughts.
Comment From: ranshid
I'm not sure i understand your suggestion. what's
cmdstat_cluster|(the second line in commandstats output)?
I meant that in order to distinguish between the aggregated values of the parent command (in my example the first line) and the case where a command was run without providing a subcommand (in my example the second line)
Comment From: itamarhaber
This one's quite a dilemma, but I'm more comfortable rejecting it than accepting it.
Comment From: soloestoy
I'm not strongly agree or disagree it.
On the one hand, the application should accept the new version's change, for example they should be aware of new commands and deprecated commands, they can just take the subcommand as new commands, and some parent commands are deprecated.
On the other hand, backward compatible is always correct, but some parent command can be executed without subcommands, and I'm not comfortable with the cmdstat_cluster| format.
If we have to make a choice, I prefer keep the current status.
Comment From: yossigo
I also slightly prefer to remain with the change as-is. While not backward compatible, it's solid and makes sense on its own, and it is a major version, after all.
Comment From: madolson
I think everyone has weighed in, consensus seems to be to accept the change. Will close unless someone else provides more justification.