Redis [NEW] Performance: Improve double to string representation using Fabian Loitsch's Grisu-algorithm ( or equivalent ) without loss of precision

Putting it simply all use cases that heavily rely on double to a string representation conversion, (e.g. meaning take a double-precision floating-point number like 1.5 and return a string like "1.5" ), could benefit from a performance boost by swapping snprintf(buf,len,"%.17g",value) by the equivalent fpconv_dtoa or any other algorithm that ensures 100% coverage of conversion. This is a well-studied topic and Projects like MongoDB. RedPanda, PyTorch leverage libraries ( fmtlib ) that use the optimized double to string conversion underneath.

The positive impact can be substantial on use-cases that heavily rely on double to string conversion as showcased on https://github.com/fmtlib/dtoa-benchmark#results: Notice that the speedup is bound to snprintf CPU time. Meaning if we spent 8% CPU time on snpritnf for double to string conversion on the best case scenario we would see an improvement of ( 8% - ( 8% / 7.46 ) = 6.92% improvment for the fpconv alternative function ( ballpark numbers ceiling numbers ).

Function	Time (ns)	Speedup
ostringstream	1,187.735	0.75x
ostrstream	1,048.512	0.85x
sprint	887.735	1.00x
fpconv	119.024	7.46x
grisu2	101.082	8.78x
doubleconv	84.359	10.52x
milo	64.100	13.85x
ryu	43.541	20.39x
fmt	40.712	21.81x
null	1.200	739.78x

Real application use-case

We've applied the fpconv function to a Redis Module ( RedisTimeSeries ), and noticed a bump in performance on one of our read workflows of ~=13% on the achievable ops/sec, as detailed here: https://github.com/RedisTimeSeries/RedisTimeSeries/pull/671. For the redis use-case I believe we should use the fast pure C implementation.

Description of the feature

Leverage the fastest C Header only library and address any non covered case with snprintf to ensure 100% coverage.

Comment From: slavak

I ran a small benchmark by modifying redis-benchmark to generate random decimal numbers instead of integers, and running it with the ZADD test. I compared the results of this when using the current snprintf implementation with a test version that uses the linked fpconv_dtoa, and did not see any performance improvement. In fact, the snprintf version performed slightly better.

Specifically, I ran the tweaked redis-benchmark with:

redis-benchmark -q -t ZADD -r 1000000 -n 100000

I executed this 5 times with a FLUSHALL before each execution so it was not affected by previous runs. The results were:

Current implementation: ZADD: 90826.52 requests per second, p50=0.287 msec ZADD: 83752.09 requests per second, p50=0.311 msec ZADD: 84459.46 requests per second, p50=0.303 msec ZADD: 82644.62 requests per second, p50=0.311 msec ZADD: 82508.25 requests per second, p50=0.311 msec (Average 84,838.19 rps, p50=0.305 msec)

fpconv_dtoa implementation: ZADD: 2444.09 requests per second, p50=0.295 msec
ZADD: 74460.16 requests per second, p50=0.335 msec
ZADD: 55897.15 requests per second, p50=0.423 msec
ZADD: 79113.92 requests per second, p50=0.319 msec
ZADD: 80906.16 requests per second, p50=0.319 msec (Average 78,783.61 rps, p50=0.327 msec)

Not sure if I'm doing anything wrong here, or how this use-case is different from that of RedisTimeSeries. Here's the patch file with all the changes necessary to replicate this test.

Comment From: filipecosta90

I ran a small benchmark by modifying redis-benchmark to generate random decimal numbers instead of integers, and running it with the ZADD test. I compared the results of this when using the current snprintf implementation with a test version that uses the linked fpconv_dtoa, and did not see any performance improvement. In fact, the snprintf version performed slightly better.

Specifically, I ran the tweaked redis-benchmark with:

redis-benchmark -q -t ZADD -r 1000000 -n 100000

I executed this 5 times with a FLUSHALL before each execution so it was not affected by previous runs. The results were:

Current implementation: ZADD: 90826.52 requests per second, p50=0.287 msec ZADD: 83752.09 requests per second, p50=0.311 msec ZADD: 84459.46 requests per second, p50=0.303 msec ZADD: 82644.62 requests per second, p50=0.311 msec ZADD: 82508.25 requests per second, p50=0.311 msec (Average 84,838.19 rps, p50=0.305 msec)

fpconv_dtoa implementation: ZADD: 2444.09 requests per second, p50=0.295 msec ZADD: 74460.16 requests per second, p50=0.335 msec ZADD: 55897.15 requests per second, p50=0.423 msec ZADD: 79113.92 requests per second, p50=0.319 msec ZADD: 80906.16 requests per second, p50=0.319 msec (Average 78,783.61 rps, p50=0.327 msec)

Not sure if I'm doing anything wrong here, or how this use-case is different from that of RedisTimeSeries. Here's the patch file with all the changes necessary to replicate this test.

@slavak as stated in the title and description of the issue and of the PR that addresses this The PR contains the https://github.com/redis/redis/pull/10587, it focuses on on double to string conversion and not the opposite. ZADD reply is of integer type and as expected should not benefit/degrad from the PR. An example command that you can use to test this out is: zrange zset:100 0 1 byscore withscores

Comment From: slavak

The ZADD code path also hits the double-to-string conversion, specifically the line you quoted in your original comment:

int d2string(char *buf, size_t len, double value) {
...
            snprintf(buf,len,"%.17g",value)

This happens via zaddCommand → zaddGenericCommand → zsetAdd → zzlInsert → zzlInsertAt → d2string.

Embarrassingly I completely missed your PR from April, sorry about that.