The performance pain

Currently, the use of prepareClientToWrite can cost up to 14% of CPU time.

Description of the performance issue

The prepareClientToWrite function has multiple conditions to validate the server's state. The function is called for any copy of a char or memcpy of a string that accumulates to a high cost.

Alternatives you've considered

  1. Move prepareClientToWrite as far back into the hot path to reduce the use of it

  2. Optimize prepareClientToWrite internally.

  3. Add client.writeStatus parameter to the client structure of type enum Client_Write_Status with states

  4. Client_Write_Uninit - used as condition whether prepareClientToWrite has to be called.
  5. Client_Write_Ok - write status is C_OK.
  6. Client_Write_Err - write status is C_ERR.

When prepareClientToWrite is called, it checks the enum status. For Client_Write_Ok and Client_Write_Err, it immediately returns. For Client_Write_Uninit, it will execute the checks.

This requires resetting the flags when a change occurs in other procedures of the server The client status should be reset to Client_Write_Uninit. A possible name for the function is clientResetBufferState.

Comment From: ashtul

A flamechart for calling HGETALL for a hash with 50 field-value pairs using mem-tier.

prepareClientToWrite

@filipecosta90 #flamecharts.

Comment From: oranagra

i think the first alternative (moving prepareClientToWrite) is not feasible.

looking at the code inside prepareClientToWrite i don't see why it would take that long, other than many cache misses. the main thing it does again and again is check a bunch of flags, as well as c->bufpos!=NULL and listLength(c->reply).

@ashtul which version did you test? maybe it's before #10697? maybe we also need to move replstate or make sure only to test it when the flags indicate that the client is a replica? i invite you to experiment with these.

the 3rd solution you suggested would be reasonable if prepareClientToWrite would actually be doing any real work, but considering that all it does is check a ton of flags, it may mean that we need to invalidate the new flag in a ton of places. i think there's a good chance we can make the function more cache friendly and that would be it.

Comment From: oranagra

p.s. please mention which workload and version you measured.