* s: command not allowed in scripts.
* R: random command. Command is not deterministic, that is, the same command
* with the same arguments, with the same key space, may have different
* results. For instance SPOP and RANDOMKEY are two random commands.
As we know, in case of inconsistency some commands have s or R flags. But there are still some other commands we lost the flags:
-
hgetallcommand should be added flagR, it's a random command.At first, I wanna add flag
S, but the reply ofhgetallis a dict and we put the dict to lua as an array, then thetable.sortget a wrong result, so flagRis better. -
bgrewriteaofshould not be allowed in scripts, just see the case below:127.0.0.1:6379> del a (integer) 1 127.0.0.1:6379> config set appendonly yes OK 127.0.0.1:6379> eval "redis.call('incr','a') redis.call('bgrewriteaof')" 0 (nil) 127.0.0.1:6379> get a "1" 127.0.0.1:6379> debug loadaof OK 127.0.0.1:6379> get a "2"We can see that after reload aof, key
ahas different value, that's because commandincr ain lua script has already been executed and thenbgrewriteaofwill persist the key-value pair to AOF file, and then the whole script will be stored in AOF file too, it means thatincrcommand has been stored twice. -
configshould not be allowed in scripts, becauseconfig set appendonly yesmay trigger AOF rewrite. -
bgsaveshould not be allowed in scripts, because it may get different reply if there are child process saving RDB/AOF or not. -
shutdownshould not be allowed in scripts, becauseEXECmay be lost in AOF file if we useredis.replicate_commands(). -
ttl,pttl,dump,objectthese command are random, for example:a) We have a master instance with
RDB_VERSION8, and a slave instance withRDB_VERSION9, so the dump command will get different replies. b) The same object may be encoded in different type between master and slave,object encodingmay get different replies. c)ttlmay be different between master and slave.- master
127.0.0.1:6379> multi OK 127.0.0.1:6379> set a b QUEUED 127.0.0.1:6379> expire a 10 QUEUED 127.0.0.1:6379> debug sleep 5 QUEUED 127.0.0.1:6379> eval "local ttl = redis.call('ttl','a') redis.call('set','attl',ttl)" 0 QUEUED 127.0.0.1:6379> exec 1) OK 2) (integer) 1 3) OK 4) (nil) (5.00s) 127.0.0.1:6379> get attl "5" - slave
127.0.0.1:7777> get attl "10"
- master
-
slowlogshould not be allowed in scripts,commandandmemorycommand may get different replies.Take
commandandmemoryas random command, and forbidslowlogin script. -
migrateshould not be allowed in scripts, lua script should not interact with other redis instance directly.
PR #4835 , please check it @antirez
Comment From: soloestoy
BTW, about the config option proto-max-bulk-len introduce some problems.
-
If proto-max-bulk-len is too small, we can never execute any commands:
127.0.0.1:6379> config get proto-max-bulk-len 1) "proto-max-bulk-len" 2) "536870912" 127.0.0.1:6379> config set proto-max-bulk-len 1 OK 127.0.0.1:6379> ping (error) ERR Protocol error: invalid bulk length 127.0.0.1:6379> config set proto-max-bulk-len 512mb (error) ERR Protocol error: invalid bulk lengthSo I think
proto-max-bulk-lenmust be 512mb or greater. -
If slave's
proto-max-bulk-lenis smaller than master's, protocol error happens:We have a master instance starts like this:
./redis-serverAnd a slave instance starts like this:
./redis-server --port 7777 --slaveof 127.0.0.1 6379 --proto-max-bulk-len 5 --loglevel verboseThen just send
set foo barbarto master, and protocol error happens on slave:28567:S 11 Apr 11:16:39.235 # == CRITICAL == This slave is sending an error to its master: 'Protocol error: invalid bulk length' after processing the command 'ping' 28567:S 11 Apr 11:16:39.235 - Protocol error (invalid bulk length) from client: id=3 addr=127.0.0.1:6379 fd=8 name= age=10 idle=0 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=57 qbuf-free=32711 obl=0 oll=0 omem=0 events=r cmd=ping. Query buffer during protocol error: '*2..$6..SELECT..$1..0..*3..$3..set..$3..foo..$6..barbar..' 28567:S 11 Apr 11:16:39.839 - DB 0: 1 keys (0 volatile) in 4 slots HT. 28567:S 11 Apr 11:16:39.839 - 1 clients connected (0 slaves), 1857768 bytes in use 28567:S 11 Apr 11:16:39.839 # Connection with master lost. 28567:S 11 Apr 11:16:40.839 * Connecting to MASTER 127.0.0.1:6379 28567:S 11 Apr 11:16:40.839 * MASTER <-> SLAVE sync started 28567:S 11 Apr 11:16:40.840 * Non blocking connect for SYNC fired the event. 28567:S 11 Apr 11:16:40.840 * Master replied to PING, replication can continue... 28567:S 11 Apr 11:16:40.840 * Partial resynchronization not possible (no cached master)To fix it I think master client should ignore
proto-max-bulk-len.
Fix it in another PR #4633.
Comment From: tw-bert
Nice observations, good summary. I agree with almost all.
But, IMHO, I would be very careful to forbid too much. You never know how endusers use the Lua scripts, even with a "use at your own risk" remark. I'd scratch slowlog, config, migrate from your list, for this reason. But, it might be a good idea to mark them as random.
As an example (but also as a requirement for the time being), we (our company) need slowlog in scripts. We do call the following Lua script from a custom metrics collector (python daemon running on same host as monitored redis instances, long lived redis connection, sending its metrics to influxdb):
local tTime = redis.call('time')
local tSlowlog = redis.call('slowlog','get', redis.call('slowlog','len'))
local bResetOK = false
if ARGV[1] == 'slowlog_execute_reset' then
bResetOK = redis.call('slowlog','reset')
end
return cmsgpack.pack({tTime, bResetOK, tSlowlog})
To create graphs like these:
Naturally, we have automated alerts in place.
As far as we know, there is no other way to collect this data atomically. If we split this up into separate client calls, we would loose slowlog data, because something could get in between the slowlog get and the slowlog reset.
Ofcourse, the SLOWLOG command could be enhanced to do the above in one atomic call from a direct client call, but that would require more work for the redis team, and could be regarded as too specific.
Cheers, TW
edit:
Now I think about it, the needed atomicity can be enforced by a simple multi exec in this scenario (so Lua could be taken out of the loop). Still, it's only an example -> we'd have to scan our Lua code base to find more examples.
Comment From: tw-bert
Ah, I see antirez also made some comments at the PR. slowlog is already out of the loop it seems. If I have more remarks I'll post them there.
Comment From: soloestoy
Hi, @tw-bert thanks to your comment.
slowlog is safe and I think just take it as a random command is OK.
About config, uh... config set is dangerous maybe we can just allow config get.
About migrate I need to check carefully.
Comment From: tw-bert
Thanks for your response, good news.
Indeed config get should be enough (marked as 'random'). I'm interested in what you find on migrate.
Good news as well that you found (and are fixing) these vulnerabilities.
Comment From: oranagra
@soloestoy can that one be closed (since #4835 was merged)?