The performance pain

Currently, the use of lpAssertValidEntry can cost 10% of CPU time.

Description of the performance issue

The lpAssertValidEntry function was added to ensure the validity of the data. The issue behavior was reproduced by lifting the deep_sanitization checks and placing the corrupted data directly into the server.

The call is costly since it sits on the Critical Path of the listpack's First, Next, Prev functions and therefore has a significant impact that should be addressed.

Alternatives you've considered

  1. Keep the current behavior and validate the data on each step.
  2. Change the assert to a debugAssert which will run only in debug mode.
  3. Run the assert periodically. For example, run it once every 256 calls. This will keep some level of sanitization at a minimal cost.
  4. Not have one behavior but introduce a server parameter that will specify the protection level.

Additional information

@filipecosta90 is working on retrieving precise statistics on the impact of the lpAssertValidEntry function.

Related to #11273 where testing for the validity of the listpack was moved from the READ functions to the WRITE function.

Comment From: ashtul

A flamechart for calling HGETALL for a hash with 50 field-value pairs using mem-tier.

lpAssertValidEntry

@filipecosta90 #flamecharts

Comment From: zuiderkwast

In #11273, you removed lpAssertValidEntry from lpFirst, lpNext and lpPrev. The assert was the last thing that happened in those functions.

What if we remove the asserts, but call lpAssertValidEntry explicitly after each lpFirst, lpNext and lpPrev call in the RDB code? That would cover RESTORE and other scenarios where we want to detect corrupt data. In rdb.c there is a lot of code which more or less duplicates logic for all key types.