We have around 10 GB of data in an EC2 instance. The instance has a native Redis server running (5.0.5). We are using it as persistent storage to store some of our real-time data. Recently we are facing an issue where the server is crashing (there used to be a pattern at exactly 2 pm UTC, but the pattern is also not reliable as it happened around 3.30 pm UTC, once). We have noticed that before crashing, the number of connections spikes up to be 20X than the normal number, resulting in an increasing number of commands being fired. This is causing some data corruption and deletion. We had also suspected an external attack but later found out that the server crashed by itself. FYI, the traffic comes to the instance from an ELB
We need to understand how to find out, what is causing the crash. And the best possible way to fix this, since this is a production issue.
Find the following logs for more details:
Application Logs:
[ERROR] - unable to flush queues {"errorName":"Error","errorMessage":"Connection is closed.","stackTrace":"Error: Connection is closed.\n at Redis.connectionCloseHandler (/var/task/node_modules/ioredis/built/redis/index.js:367:24)\n at Object.onceWrapper (events.js:421:26)\n at Redis.emit (events.js:314:20)\n at Redis.EventEmitter.emit (domain.js:483:12)\n at processTicksAndRejections (internal/process/task_queues.js:79:11)"}, [ERROR] - Error: Connection is closed.
Redis server logs at the time of crashing
- Background saving terminated with success
- DB saved on disk User requested shutdown...
- Removing the pid file. Redis is now ready to exit, bye bye...
OS:
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
REDIS: Version: 5.0.5
Comment From: sundb
@AbhijitAexonic Since the log you provided contains User requested shutdown..., so redis did not crash, but was closed manually.
Maybe you should confirm whether the machine is hacked, or restrict the permissions of shutdown command.
Comment From: AbhijitAexonic
@sundb yes, we suspected that, so we created a new instance and changed all the connection credentials. we even have a random hostname for the new connection. Also added some security in the instance. and this was done a couple of days back so its very unlikely to be an external attack. FYI, the traffic comes to the instance from an ELB
Comment From: sundb
@AbhijitAexonic Maybe you can try to use rename-command SHUTDOWN "" to avoid redis being shut down from internal.
Comment From: AbhijitAexonic
@AbhijitAexonic Maybe you can try to use
rename-command SHUTDOWN ""to avoid redis being shut down from internal.
This doesn't seem to be working
Comment From: sundb
@AbhijitAexonic Do you check history?
Because it can be seen from the log that redis was shut down normally, so if redis is not shut down by shutdown command, it can only be killed by manually.