Hi @antirez , these codes below make me confused:
/* Technically speaking PFCOUNT may change the key since it changes the
* final bytes in the HyperLogLog representation. However in this case
* we claim that the representation, even if accessible, is an internal
* affair, and the command is semantically read only. */
{"pfcount",pfcountCommand,-2,
"read-only @hyperloglog",
0,NULL,1,-1,1,0,0,0},
And
/* This is not considered a read-only command even if the
* data structure is not modified, since the cached value
* may be modified and given that the HLL is a Redis string
* we need to propagate the change. */
signalModifiedKey(c->db,c->argv[1]);
server.dirty++;
Normally considering PFCOUNT as read-only command doesn't matter anything, but in lua script it may lead to data inconsistency I think.
Comment From: oranagra
this is indeed odd, but considering the effect of that read-only command are observable by other commands, it must be replicated (otherwise there's a risk that some followup command will fail or produce a different effect on the replica). however allowing a read only command on read-only replica can cause a similar issue. maybe the only thing that can be done at this point is to mark the command as a write command.
it seems that the proper solution is to give HLL a type of its own, so that the changes PFCOUNT makes are not observable by any other command (except for DUMP, DEBUG, and MEMORY). doing that will cause some backwards compatibility issues though.
It should be relatively easy to recognize the old HLL strings by their header signature, these may be present in the rdb and also in a SET command (which may come from some tool that reads an RDB and converts it to commands). But beyond that, maybe some user use cases use the GET command for some reason (instead of dump), or rely on the output of TYPE.
Comment From: soloestoy
fixed in #8170 by introducing may-replicate flag
Comment From: chendq8
@oranagra @soloestoy I don't quite understand why we need to replicate pfcount? The cache of card is to speed up pfcount request but replicate to slave will let slave slower on exec pfcount from master.
This replication behavior causes the command to be executed only on the master node, if we send request to slave to execute, the value will also different between master and slave.
Comment From: soloestoy
I remember we discussed it so many times, and we all agree using a simple string to store hyperloglog is not a good choice, the best thing we can fix it is adding a new data type for hyperloglog, and it may be a breaking change, so let's put it into the 7.2 backlog.
Comment From: chendq8
LGTM
Comment From: QuChen88
Ping on this. Any plans to fix this issue properly with HLL data type in 7.2?
Comment From: madolson
I think we moved it to this issue, https://github.com/redis/redis/issues/11249, so keeping this item closed.