Redis Question : Latency Histograms per Command

Is there any reason why Redis doesn't provide server-side latency histograms per command? I see there are various ways to get samples of latency data for various events, but what about getting a histogram of latencies for a corresponding period of time. I have client side latencies available in a histogram, and I'm looking to have server-side latencies so I can work out if client, network, server is dominating the critical path of a certain percentile.

If this feature doesn't exist, was this by design?

Comment From: itamarhaber

If this feature doesn't exist, was this by design?

Yes. No. Yes - the basic Redis design premise, afaigi, is simplicity. We already have the slowlog, why bother with per command histograms? No - I suspect it was never requested, but your need makes it look like it would be nice expansion of the current introspection arsenal.

(disclaimer: not a maintainer - above is personal opinions only)

Comment From: toaler

Thanks @itamarhaber. Slow log is useful to look at requests that fall within in a certain subset of the overall latency spectrum. However I would think you'd want to know what your higher latencies percentiles are to choose a reasonable slowlog threshold.

We are very interested in understanding the latency ranges across the different percentiles.

Do others think having histogram per operation type would be useful?

Comment From: itamarhaber

(see disclaimer)

Personally, I can't resist any temptation that involves introspection :) That said, besides the coding required, this will slow down the server because there are no free lunches. I'd certainly like to have this as an optional, turn-off-able if not by-default-disabled, feature.

Comment From: danni-m

+1 having better performance metrics in Redis can be awesome, especially for modules or use cases where the commands can take longer. Having said that implementing sliding window percentile can be both CPU and memory heavy (there are some algorithms to get an approximation[http://mjambon.com/2016-07-23-moving-percentile/]). A simpler approach might be having time buckets for each command(1ms to 1s) will be a bit more efficient.

Redis is very efficient, but sometime slowness can be triggered by external factors (noisy neighbour, lack of CPU, disk performance when using AOF), having this metrics will help users to better understand their performance and act on actual data and not guessing.

Comment From: antirez

Once I've the Stream data structure internally, I want to add options to track, for instance, memory usage and other things in streams over time, so that this is for free for everybody running a Redis instance without any need of having a more complex setup with monitoring systems. However for the latency distribution it does not make much sense, because from within Redis you can measure easily only the command dispatch time. If, for instance, Redis is blocked serving other clients or slow requests or busy on some disk I/O or whatever, which is what you want to avoid as latency sources, all those events will not be reflected in the time to execute the command itself. So such data would be meaningless unfortunately. Instead what we provide, that is, the probes on different sources of latency, are a very powerful way in order to analyze what is wrong, when it is possible to see latency from the outside. The only way Redis could monitor its latency in a kinda reliable way would be spawning a different thread that would ping the server every second or so. This could be much more useful, and could be one of the things to add in the monitoring stream, if Redis is configured to do so.

Comment From: toaler

It feels like the critical path for a Redis request is:

request time = command dispatch time + command exec time

which is

request time = wait time + service time

If we only report on wait time, then we miss latency issues when something influence command exec time (service time) negatively. Does Redis not have a timer encapsulating request time (wait + service time)? Seems like there is one for wait time (command dispatch time) but request time and service time timers are absent.

Latencies issues can manifest in both command dispatch time as well as in various events that can influence service time (cpu saturated host, critical pages being read from disk, etc). I appreciate that probes are useful when trying to triage what is influencing higher level request latencies, however, we want to understand the distribution of request time across the histogram spectrum for various reasons. Understanding how p50 through p99 is of interest, especially when we have certain SLA's we need to guarantee clients.

What is the downside of adding a few extra timers that take nanoseconds to acquire, and nanoseconds to perform the diffs, then streaming out to a histogram off the critical path of the request?

Comment From: toaler

@danni-m HdrHistogram I believe works around some of the disadvantages other histogram implementations demonstrate.

http://hdrhistogram.github.io/HdrHistogram/ https://github.com/HdrHistogram/HdrHistogram_c

Comment From: danni-m

I agree with @toaler point, I think both Round-Trip-Time between client and redis-server are interesting and redis execution time. I will say that from a module developer side, we are adding some new functionality that is somewhat depended on the query and the data. Having time buckets in addition to avg time will add much more visibility and will not hurt performance unlike usage of streams. also if we take 10 buckets per command the overhead will be very minimal.

Comment From: filipecosta90

Core team I believe this can be a very good addition to the whole observability features that redis offers. If you see that per-command latency histograms is too much of an overhead we can try to mimic what MongoDB offers and use a per command type histogram ( read,write, etc... ).

Adding the MongoDB feature that maps to this: https://docs.mongodb.com/manual/reference/method/db.collection.latencyStats/

Their exposition format is a simple one:

histogram: [
  { micros: NumberLong(1), count: NumberLong(10) },
  { micros: NumberLong(2), count: NumberLong(1) },
  { micros: NumberLong(4096), count: NumberLong(1) },
  { micros: NumberLong(16384), count: NumberLong(1000) },
  { micros: NumberLong(49152), count: NumberLong(100) }
]

with the corresponding reads/writes histogram:

> db.countries.latencyStats({histograms:true})
[
  {
    ns: 'test.countries',
    host: '72c68390d933:27017',
    localTime: 2020-12-15T16:25:36.141Z,
    latencyStats: {
      reads: { histogram: [ { micros: 64, count: 2 } ], latency: 171, ops: 2 },
      writes: {
        histogram: [ { micros: 128, count: 2 }, { micros: 256, count: 1 } ],
        latency: 681,
        ops: 3
      },
      commands: {
        histogram: [ { micros: 32768, count: 1 } ],
        latency: 48984,
        ops: 1
      },
      transactions: { histogram: [], latency: 0, ops: 0 }
    }
  }
]

exposition format

There should be various ways of exposition this on the wire, but something between what prometheus/mongo do should be a good suite spot. Adding the histogram text representation exposed by Prometheus:

# HELP request_duration Time for HTTP request.
# TYPE request_duration histogram
request_duration_bucket{le="0.005",} 0.0
request_duration_bucket{le="0.01",} 0.0
request_duration_bucket{le="0.025",} 0.0
request_duration_bucket{le="0.05",} 0.0
request_duration_bucket{le="0.075",} 0.0
request_duration_bucket{le="0.1",} 0.0
request_duration_bucket{le="0.25",} 0.0
request_duration_bucket{le="0.5",} 0.0
request_duration_bucket{le="0.75",} 0.0
request_duration_bucket{le="1.0",} 0.0
request_duration_bucket{le="2.5",} 0.0
request_duration_bucket{le="5.0",} 1.0
request_duration_bucket{le="7.5",} 1.0
request_duration_bucket{le="10.0",} 3.0
request_duration_bucket{le="+Inf",} 3.0
request_duration_count 3.0
request_duration_sum 22.978489699999997

todos

I'm deeply interested in implementing this feature ( in various ways that you guys think might be interesting to analyze the pros/cons -- per command, per command type, etc... ). WDYT?

Comment From: yossigo

@filipecosta90 I agree with you this can be a very useful feature for Redis, and I wouldn't give up the command resolution. The question is how avoid complexity and maintain a zero overhead when this is not in use.

BTW we already have hdr_histogram as a dependency now, so the gap is definitely smaller than it was when the issue was filed.

Comment From: filipecosta90

@filipecosta90 I agree with you this can be a very useful feature for Redis, and I wouldn't give up the command resolution. The question is how avoid complexity and maintain a zero overhead when this is not in use.

BTW we already have hdr_histogram as a dependency now, so the gap is definitely smaller than it was when the issue was filed.

Agree. should it be an extension to INFO COMMANDSTATS? an extension to LATENCY HISTOGRAM <command name>? or a dedicated new command?

Comment From: madolson

I'm not convinced histogram metrics are really that useful to collect on a per command level. I agree we can construct modes where they would be useful, but I think the main use case would be for setting the slowlog value so that you can figure out the problematic command specifically.

Comment From: filipecosta90

I'm not convinced histogram metrics are really that useful to collect on a per command level. I agree we can construct modes where they would be useful, but I think the main use case would be for setting the slowlog value so that you can figure out the problematic command specifically.

@madolson with regards to per-command histograms, user ( devops/providers) can use it: - to properly define, and check SLO/As - to find culprits on mixed commands workloads - to find multi-modal distributions on the same command ( slow clients vs fast ones ) - to connect with percentile values above their stack ( meaning if 99% of my app requests are made up until X, and 99% of DB requests are served up intl Y we can start correlating it :) )

I believe we can have both ( per group, per command ) and the user can choose it. The histograms are fully mergeable so we can construct per group from the per command ones, and a total from the per group/total.

Comment From: madolson

So my experience here is less from a Redis maintainer, and more from my experience managing ElastiCache. We should be exposing metrics to help users understand what is going on and how Redis is operating, I think this is just throwing data at the user.

to properly define, and check SLO/As

SLAs are normally defined on end-to-end time for your higher level services, which isn't really solved here. Redis generally has predictable performance, unlike a lot of traditional databases, so you first need to answer the question, "Was it Redis being slow".

to find culprits on mixed commands workloads

You can construct workloads where this is true, but it would also probably show up in the average/total time per operation. You also still need to use slowlog to find the culprit.

to find multi-modal distributions on the same command ( slow clients vs fast ones )

This will solve this problem! My argument was that this isn't a problem we should solve.

to connect with percentile values above their stack ( meaning if 99% of my app requests are made up until X, and 99% of DB requests are served up intl Y we can start correlating it :) )

I don't think the answer here is to have percentiles in the database execution time, but to have observability built into the clients or application. DB level telemetry helps you understand roughly what is going on, but can lead you to wrong conclusions.

TL;DR, observability is important, I'm not sure this is the right incremental step towards getting there.

Comment From: filipecosta90

@yossigo @madolson independently if it will command/(command group based), wdyt of the exposition via INFO LATENCYSTATS section, with the following format:

INFO LATENCYSTATS
# Latencystats
latencystat_<CMDNAME>:count=XX,usecs=YY,histogram=[{le=<bucket1micros>;count=<bucket1 count>},{le=<bucket1micros>;count=<bucket2 count>};(...)]

This could then be easily parsed by clients/tools.

Comment From: madolson

I think INFO LATENCYSTATS makes the most sense. I think your suggestion will break client parsing though. The client I'm aware of is:https://github.com/andymccurdy/redis-py/blob/master/redis/client.py#L119 will mangle it. We saw in the past all of INFO breaking because one line was mangled, so we should try not to do that again.

We should double check how different clients parse outputs, but probably something like this might work:

histogram=[(micros1:count);(micros2:count2)]

It's an awkward syntax, but we can't use =/,. Not sure if the values inside the structure need to be named like in your suggestion, I think just a tuple would be simpler and more compact.

Comment From: filipecosta90

@yossigo @madolson @itamarhaber started working on a POC on my fork (link) and IMHO @madolson is right that we don't need the per-command histogram. WDYT about using the command categories to track the extended latency metrics?: categories ( total 12 histograms )

struct redisLatencyHistogram latencyHistogramsTable[] = {
    {"read",0,NULL},
    {"write",0,NULL},
    {"set",0,NULL},
    {"sortedset",0,NULL},
    {"list",0,NULL},
    {"hash",0,NULL},
    {"string",0,NULL},
    {"bitmap",0,NULL},
    {"hyperloglog",0,NULL},
    {"stream",0,NULL},
    {"pubsub",0,NULL},
    {"geo",0,NULL},
};

Some details: - We associate the command categories histograms pointers to commands on the populateCommandTable() - We should have 0-2 categories per command, and the reduced number of categories makes this both CPU and memory optimized. - With regards to the overhead I've preveously measured ~5-6 nanos per ingestion >>see reference<< which makes this very low overhead. - We track from 1 microsecond to 1 second ( everything above 1 second is considered +Inf ) - When we process a command we go through the command categories pointers ( please check the POC, I believe this is the fastest way and with less overhead ). - We have two groups of outputs: latency by percentile distribution and cumulative distribution of latencies. Using the cumulative distribution of latencies we can merge several stats from different nodes to calculate aggregate metrics ( percentile distribution is not mergeable between nodes ).

latencystats output sample:

$ redis-cli info latencystats
# Latencystats - latency by percentile distribution
latencypercentiles_read:p0=1.000,p50=21.000,p75=35.000,p90=44.000,p95=48.000,p99=76.000,p999=93.000,p100=993.000
latencypercentiles_write:p0=1.000,p50=1.000,p75=2.000,p90=3.000,p95=5.000,p99=11.000,p999=23.000,p100=317.000
latencypercentiles_set:p0=1.000,p50=1.000,p75=2.000,p90=2.000,p95=3.000,p99=8.000,p999=20.000,p100=317.000
latencypercentiles_sortedset:p0=1.000,p50=2.000,p75=3.000,p90=4.000,p95=5.000,p99=9.000,p999=23.000,p100=103.000
latencypercentiles_list:p0=1.000,p50=1.000,p75=22.000,p90=41.000,p95=44.000,p99=66.000,p999=90.000,p100=993.000
latencypercentiles_hash:p0=1.000,p50=1.000,p75=1.000,p90=2.000,p95=3.000,p99=7.000,p999=16.000,p100=223.000
latencypercentiles_string:p0=1.000,p50=1.000,p75=4.000,p90=6.000,p95=9.000,p99=17.000,p999=36.000,p100=296.000
latencypercentiles_stream:p0=1.000,p50=2.000,p75=2.000,p90=2.000,p95=3.000,p99=4.000,p999=21.000,p100=86.000
latencypercentiles_pubsub:p0=1.000,p50=1.000,p75=1.000,p90=1.000,p95=1.000,p99=1.000,p999=2.000,p100=24.000

# Latencystats - cumulative distribution of latencies
latencyhist_read:calls=500000,histogram=[(2:98618);(4:99388);(8:176599);(16:199320);(32:350650);(64:489749);(128:499900);(256:499959);(512:499996);(1024:500000)]
latencyhist_write:calls=2346481,histogram=[(2:2031680);(4:2223108);(8:2318678);(16:2338760);(32:2345720);(64:2346381);(128:2346467);(256:2346478);(512:2346481)]
latencyhist_set:calls=200000,histogram=[(2:180276);(4:194359);(8:198488);(16:199681);(32:199934);(64:199979);(128:199995);(256:199998);(512:200000)]
latencyhist_sortedset:calls=200000,histogram=[(2:111917);(4:189847);(8:197913);(16:199494);(32:199934);(64:199996);(128:200000)]
latencyhist_list:calls=900000,histogram=[(2:490273);(4:497317);(8:575807);(16:599027);(32:750614);(64:889743);(128:899899);(256:899959);(512:899996);(1024:900000)]
latencyhist_hash:calls=100000,histogram=[(2:93436);(4:97723);(8:99365);(16:99926);(32:99988);(64:99996);(128:99999);(256:100000)]
latencyhist_string:calls=400000,histogram=[(2:289589);(4:306316);(8:379993);(16:395360);(32:399464);(64:399939);(128:399993);(256:399999);(512:400000)]
latencyhist_stream:calls=1046481,histogram=[(2:964807);(4:1036934);(8:1043711);(16:1044592);(32:1046436);(64:1046477);(128:1046481)]
latencyhist_pubsub:calls=100000,histogram=[(2:99941);(4:99954);(8:99962);(16:99967);(32:100000)]

Comment From: yossigo

@filipecosta90 I haven't looked at the code yet, but some questions:

Why do you prefer command categories? I always find that very awkward because not all Redis commands map cleanly, plus this kind of information can probably be very valuable to troubleshoot cases where command resolution is needed.
Maybe histogram data should be dumped in a more native mbulk format to ease parsing on the client side? If we do that, we can compensate users by adding parsing and basic visualization to redis-cli.

Comment From: filipecosta90

@filipecosta90 I haven't looked at the code yet, but some questions:

1. Why do you prefer command categories? I always find that very awkward because not all Redis commands map cleanly, plus this kind of information can probably be very valuable to troubleshoot cases where command resolution is needed.

@yossigo just to reduce the memory overhead of the histograms. But I believe that can be sorted out ( and improved ). There is still one thing that makes me think about this level of granularity per command which is the fact the when tracking the time within call() a big portion of the requests is very fast, as seen for example on the following stats line (p0=p50=p75) :

latencypercentiles_hash:p0=1.000,p50=1.000,p75=1.000,p90=2.000,p95=3.000,p99=7.000,p999=16.000,p100=223.000

Do you think we should track the latencystats as the internal command execution time? or add a latency metrics that tracks from 1st byte arrival to last byte sent so this level of granularity is more useful ( and use histograms for that and keep commandstats for the internal time )?

Comment From: yossigo

@filipecosta90 Tracking arrival of first byte is non trivial, because we only know something about the command when it has been fully parsed (which could happen a long time after the first byte was read into the query buffer).

Comment From: filipecosta90

@yossigo @madolson @oranagra I've opened a POC PR to start discussing this more deeply: https://github.com/redis/redis/pull/9462