it could be nice if XTRIM had a mode in which it trims all entries with ID < minimal-wanted-id... for example:

127.0.0.1:6379> XRANGE x - +
1) 1) "1575363001970-0"
   2) 1) "f1"
      2) "v1"
2) 1) "1575363004144-0"
   2) 1) "f1"
      2) "v1"
3) 1) "1575363004833-0"
   2) 1) "f1"
      2) "v1"
4) 1) "1575363005448-0"
   2) 1) "f1"
      2) "v1"
5) 1) "1575363006264-0"
   2) 1) "f1"
      2) "v1"
127.0.0.1:6379> XTRIM x MINID 1575363005439-0
(integer) 3
127.0.0.1:6379> XRANGE x - +
1) 1) "1575363005448-0"
   2) 1) "f1"
      2) "v1"
2) 1) "1575363006264-0"
   2) 1) "f1"
      2) "v1"

it could even work with the ~ modifier, trimming only full rax nodes where the last entry has ID < minimal-wanted-id

Comment From: guybe7

@antirez let me know if you think it's a good idea, i can implement it and PR

Comment From: itamarhaber

@guybe7 I'd love to have it.

This would provide an adequate (O(N)-ish) resolution to the community's requests for "time-based trimming". Perhaps it is also worth considering adding the following sugar toppings:

  • BEFORE [~] <milliseconds> - removes messages that have ids less than the server's current time minus the argument.
  • RETAIN [~] <milliseconds> - removes messages that have ids less than the stream's max id minus the argument

Lastly, one should remember that the MAXLEN subcommand is shared with XADD. Thus, MINID and possibly the others need similar consideration.

References: #4450, #5951 and https://stackoverflow.com/questions/51168273/redis-stream-managing-a-time-frame

Comment From: gkorland

@antirez it feels like an important feature, that will really help the users have better control of the streams' memory usage

Comment From: antirez

Hey, the reason so far we don't have:

  1. Time bound eviction.
  2. By ID eviction.

Is to avoid masking O(N) operations as normal operations. We need some more better design about this, or this will be the KEYS of Redis Streams. I can't do an analysis before RC1, so we have to postpone this things for 2020, but if you want an hint, one possible idea (not sure if it is good enough, so don't take my words as a good solution) is to have always a "max effort" parameter in the arguments (that is, basically, the number of whole macro-nodes a single operation can remove), so that whatever index, or time, you specify, it will never block for more than 100 nodes deallocations or so forth. Pick a sensible default for this parameter, and then have the non-default value of 0 that means "FORCE" the deletion of the offsets / times I specified.

Comment From: gkorland

  1. By ID eviction

Doesn't it have the same masking O(N) complexity as the current api XTRIM mystream MAXLEN count API? Calling with MAXLEN 10 might cause eviction of 100000 items

Comment From: antirez

@gkorland the difference is that when you do a pattern like XADD + XTRIM, the state of the stream never changes, so if you are capping the stream to 1000 items, even if time passes or alike, you'll still continue to have an O(1) operation. You can still create problems by updating the app and changing the setting from 10000000 to 1000 without any check, but unfortunately there is no way to avoid that. At least in the normal intended pattern, O(N) operations are rare. With time based eviction instead, at time passes because of a delay, the operation may take more and more time. When specifying an ID, given that the ID is bound with time, the most obvious operation that you can perform with it is to remove a range of time, that has the same problem, or to do two queries and then remove a range of IDs that represent some logical interval, without knowing how many entries there are inside. So in general deletion is problematic, but range and time based deletion is more problematic, and if we can do something to make the interface not just harder to mis-use, but also helpful for people that don't want to introduce big latencies, this will be a big win. For instance it is very useful if the API is also able to tell us if the whole range / time was evicted or not when we specified a maximum effort, so that the application layer can call the command again and again, maybe with a delay in between, in order to eventually evict completely the interval.

Comment From: yossigo

@antirez so basically to avoid KEYS-like problems with streams you're suggesting to take a more SCAN-like approach, right?

I think that may apply to XTRIM as it is now and with the enhancements proposed by @guybe7 in practically the same way.

Comment From: ostcar

Would it be possible for XTRIM to have a mode that trims the lowest ids with ID < minimal-wanted-id but at most 20 (or another number) of macro-nodes?

The XADD + XTRIM pattern would run in constant time and eventually trim all requested elements.

Since XTRIM returns the number of the deleted entries, a client could run XTRIM multible times until it returns 0. This would be SCAN-like.

while xtrim(...);

This could also be done in a LUA-Script and therefore be KEYSlike.

I am no C developer and are not able to create a pull request. But I would like if it would be possible to add a time-based or id-based mode for XTRIM.

Comment From: guybe7

@itamarhaber the PR implements the MINID strategy (i.e. trim all entries with id < )

are we sure we also want BEFORE/RETAIN?

so far we managed to avoid assuming that stream ids are actually timestamps, and this might create a dangerous precedence... especially that the user can easily use MINID to achieve the effect of BEFORE/MAXAGE. RETAIN is also very simple but requires a round-trip or Lua