Hello Experts,

We were running certain performance load tests on Redis 7.0.6 version using redis-benchmark utility. As a part of this test, We first populate Redis memory with certain expiry and non-expiry data. Then we run the following redis-benchmark command redis-benchmark -h -p 6379 -c 1000 -n 1700000 -t get -d 18000 -P 300 -l

We keep repeating the above command with different pipeline numbers. And this test keeps running for 30 mins for each pipeline. However, we are noticing that after few minutes of the test the redis-benchmak command stops giving the following error

99.38% <= 743 milliseconds 99.48% <= 744 milliseconds 99.56% <= 745 millisecondCould not connect to Redis at keng03-dev01-ins01-dmc39-app-1676268068-1.int.dev.mykronos.com:6379: No address associated with hostname s 99.67% <= 746 milliseconds 99.71% <= 803 milliseconds 99.78% <= 804 milliseconds 99.89% <= 805 milliseconds 100.00% <= 805 milliseconds 87130.34 requests per second

This keeps happening in every test.

Can you please let us know - The reason behind this error? Is this an error behavior? What can be done to fix this error?

Comment From: ranshid

I suspect the error you are getting is the result of redis-benchmark calling the ipv6 getaddrinfo after failing to resolve the ipv4 in some cases (for example when failing on EAI_AGAIN) we can probably fix redis-benchmark to handle EAI_AGAIN better and only fallback to IPV6 in case of EAI_ADDRFAMILY error.

Comment From: ranshid

@geekthread as a workaround, can you check if adding the mapped ip address to the hosts file helps to mitigate the issue?

Comment From: geekthread

@ranshid - which configuration directive would be ideal for such configuration . (bind ?)

Comment From: ranshid

I am not sure we have any configuration to support it. but maybe this is an issue of overloaded DNS server. I suggested you try and manually set the host --> ip mapping in /etc/hosts file (in case this is a linux machine) and retry to understand if my assumption was correct and this would help mitigate the issue

Comment From: geekthread

sure @ranshid , we will check this and get back to you :)

Comment From: geekthread

We increased the maxclients from default 10k to 20k, the error is not coming now. Not sure why ? Any thoughts @ranshid

Comment From: geekthread

hello team , pls share your valuable inputs.

Comment From: geekthread

@oranagra / team - can you please take a look into this

Comment From: ranshid

@geekthread I can't think of why changing maxclients on the server side would mitigate this issue. I still think this is related to address resolution on redis-benchmark side here I tried reproducing your following your general steps included in the top comment but was unable to reproduce. If you can provide a way I can reproduce, or maybe use GDB to understand were the redis-benchmark fail it would be helpful.

PS. did you check is adding the host mapping in /etc/hosts helps?

Comment From: geekthread

Hello @ranshid ,

Setup for replicating this issue is as below : System Total Memory : 64 gb, Max Memory : 52 gb filled (31.2 gb non expiry data, 20.8 gb expiry data with varying ttl)

Test nohup redis-benchmark -h -p 6379 -c 1000 -r 1700000 -n 1700000 -t get -d 18000 -P 500 -l

Issue :

99.56% <= 745 millisecondCould not connect to Redis at :6379: No address associated with hostname

Comment From: geekthread

Will share finding for host mapping in sometime.

Comment From: ranshid

thank you @geekthread. also if you have the option to provide tcpdump we might understand the issue from exploring the dump file

Comment From: oranagra

maybe strace and help here to reveal something we're missing, but i agree with Ran, this looks like a DNS resolution issue, and not in any way related to maxclients, please double check that reducing maxclients reintroduce the problem (it doesn't make sense to me)

Comment From: ranshid

@oranagra there is still a potential issue with the way hiredis handles EAI_AGAIN. currently it will fallback to ipv6 which can result in the case we see now. I suspect we can improve the hiredis to better handle the case of EAI_AGAIN and not exit out. the main thing is that AFAIK it is not completely defined if EAI_AGAIN will be a temporal condition or it can cause the benchmark to spin retrying.

Comment From: oranagra

let's ping @michael-grunder about hiredis, but we also need to confirm it's a DNS issue and dismiss or realize how maxclients affects it.

Comment From: geekthread

@ranshid , I re-tested by providing the ip address instead of hostname and reverting the maxclients to 10k. Initial results looks good so far. Will be testing for couple of more times.

We are using below config in sentinel.conf sentinel resolve-hostnames yes

Could this be a potential root cause ?