Putting it simply all use cases that heavily rely on double to a string representation conversion, (e.g. meaning take a double-precision floating-point number like 1.5 and return a string like "1.5" ), could benefit from a performance boost by swapping snprintf(buf,len,"%.17g",value) by the equivalent fpconv_dtoa or any other algorithm that ensures 100% coverage of conversion.
This is a well-studied topic and Projects like MongoDB. RedPanda, PyTorch leverage libraries ( fmtlib ) that use the optimized double to string conversion underneath.
The positive impact can be substantial on use-cases that heavily rely on double to string conversion as showcased on https://github.com/fmtlib/dtoa-benchmark#results:
Notice that the speedup is bound to snprintf CPU time. Meaning if we spent 8% CPU time on snpritnf for double to string conversion on the best case scenario we would see an improvement of ( 8% - ( 8% / 7.46 ) = 6.92% improvment for the fpconv alternative function ( ballpark numbers ceiling numbers ).
| Function | Time (ns) | Speedup |
|---|---|---|
| ostringstream | 1,187.735 | 0.75x |
| ostrstream | 1,048.512 | 0.85x |
| sprint | 887.735 | 1.00x |
| fpconv | 119.024 | 7.46x |
| grisu2 | 101.082 | 8.78x |
| doubleconv | 84.359 | 10.52x |
| milo | 64.100 | 13.85x |
| ryu | 43.541 | 20.39x |
| fmt | 40.712 | 21.81x |
| null | 1.200 | 739.78x |
Real application use-case
We've applied the fpconv function to a Redis Module ( RedisTimeSeries ), and noticed a bump in performance on one of our read workflows of ~=13% on the achievable ops/sec, as detailed here: https://github.com/RedisTimeSeries/RedisTimeSeries/pull/671. For the redis use-case I believe we should use the fast pure C implementation.
Description of the feature
Leverage the fastest C Header only library and address any non covered case with snprintf to ensure 100% coverage.
Comment From: slavak
I ran a small benchmark by modifying redis-benchmark to generate random decimal numbers instead of integers, and running it with the ZADD test. I compared the results of this when using the current snprintf implementation with a test version that uses the linked fpconv_dtoa, and did not see any performance improvement. In fact, the snprintf version performed slightly better.
Specifically, I ran the tweaked redis-benchmark with:
redis-benchmark -q -t ZADD -r 1000000 -n 100000
I executed this 5 times with a FLUSHALL before each execution so it was not affected by previous runs. The results were:
Current implementation: ZADD: 90826.52 requests per second, p50=0.287 msec ZADD: 83752.09 requests per second, p50=0.311 msec ZADD: 84459.46 requests per second, p50=0.303 msec ZADD: 82644.62 requests per second, p50=0.311 msec ZADD: 82508.25 requests per second, p50=0.311 msec (Average 84,838.19 rps, p50=0.305 msec)
fpconv_dtoa implementation:
ZADD: 2444.09 requests per second, p50=0.295 msec
ZADD: 74460.16 requests per second, p50=0.335 msec
ZADD: 55897.15 requests per second, p50=0.423 msec
ZADD: 79113.92 requests per second, p50=0.319 msec
ZADD: 80906.16 requests per second, p50=0.319 msec
(Average 78,783.61 rps, p50=0.327 msec)
Not sure if I'm doing anything wrong here, or how this use-case is different from that of RedisTimeSeries. Here's the patch file with all the changes necessary to replicate this test.
Comment From: filipecosta90
I ran a small benchmark by modifying redis-benchmark to generate random decimal numbers instead of integers, and running it with the ZADD test. I compared the results of this when using the current
snprintfimplementation with a test version that uses the linkedfpconv_dtoa, and did not see any performance improvement. In fact, thesnprintfversion performed slightly better.Specifically, I ran the tweaked redis-benchmark with:
redis-benchmark -q -t ZADD -r 1000000 -n 100000I executed this 5 times with a
FLUSHALLbefore each execution so it was not affected by previous runs. The results were:Current implementation: ZADD: 90826.52 requests per second, p50=0.287 msec ZADD: 83752.09 requests per second, p50=0.311 msec ZADD: 84459.46 requests per second, p50=0.303 msec ZADD: 82644.62 requests per second, p50=0.311 msec ZADD: 82508.25 requests per second, p50=0.311 msec (Average 84,838.19 rps, p50=0.305 msec)
fpconv_dtoa implementation: ZADD: 2444.09 requests per second, p50=0.295 msec ZADD: 74460.16 requests per second, p50=0.335 msec ZADD: 55897.15 requests per second, p50=0.423 msec ZADD: 79113.92 requests per second, p50=0.319 msec ZADD: 80906.16 requests per second, p50=0.319 msec (Average 78,783.61 rps, p50=0.327 msec)
Not sure if I'm doing anything wrong here, or how this use-case is different from that of RedisTimeSeries. Here's the patch file with all the changes necessary to replicate this test.
@slavak as stated in the title and description of the issue and of the PR that addresses this The PR contains the https://github.com/redis/redis/pull/10587, it focuses on on double to string conversion and not the opposite. ZADD reply is of integer type and as expected should not benefit/degrad from the PR. An example command that you can use to test this out is: zrange zset:100 0 1 byscore withscores
Comment From: slavak
The ZADD code path also hits the double-to-string conversion, specifically the line you quoted in your original comment:
int d2string(char *buf, size_t len, double value) {
...
snprintf(buf,len,"%.17g",value)
This happens via zaddCommand → zaddGenericCommand → zsetAdd → zzlInsert → zzlInsertAt → d2string.
Embarrassingly I completely missed your PR from April, sorry about that.