Redis Implement a new Redis policy to limit the valid duration of auth sessions

[Edit]: instead consider implementing CLIENT KILL AGE, see: https://github.com/redis/redis/issues/12096#issuecomment-1529387591

Proposed Feature:

Introduce a new mutable configuration called max-auth-age, specified in seconds.
Set the default value to 0, meaning OFF.
Record the last successful AUTH time for each client.
Periodically check the auth age for each client; if it exceeds max-auth-age, disconnect the client.

Feedback:

          @PingXie it looks like your PR is implementing a completely different feature.

this one was about dropping useless connections (consume resources) if they failed to authenticate in time. yours is to require authenticated clients to re-authenticate periodically or else their auth will be revoked. i don't think we would want such a feature...

regarding point 4: i think the concept is "serverAuthRequired" (not "userAuthRequired"). i.e. if the default user has no password, then anyone can connect to the server (no AUTH is required by the server), as soon as the default user has a password or is disabled, then some form or AUTH / HELLO is required. that's what we aim to achieve there.

Originally posted by @oranagra in https://github.com/redis/redis/issues/11146#issuecomment-1518977893

Comment From: PingXie

@oranagra @madolson @zuiderkwast @yossigo

Continuing the discussion:

Rationale for introducing a feature to limit the valid duration of authentication sessions

At a high level, authentication is about secrets, and secrets need to be periodically rotated. With the implementation of #11659, we can now tie authentication to something much more secure than plaintext passwords stored in redis.conf. In my view, rotation is a natural progression.

ServerAuth vs. UserAuth

ServerAuth made sense before the introduction of the ACL feature. ACL however allows for much more granular access control. Customers can retain the default user but limit its access to specific keys, channels, or commands, which can be much narrower than what is allowed by default. Simultaneously, customers can introduce users with more privileges. I believe my proposed changes are backward compatible with the current behavior while also supporting a more generic RBAC model.

WDYT?

Comment From: oranagra

regarding password rotation, i think what we would wanna do is disconnect the client when the password it was using is no longer valid, i.e. #11986. i don't see a point to randomly disconnect or de-authenticate the client after some time when the password it's using is still valid, and require a re-connection or re-authenticate with the same password.

regarding the authRequired() function, the purpose it serves is to know if the server should interact with completely unauthenticated connections, or does it require some sort of AUTH / HELLO first. Your PR attempts to do something different with it, and that's why you think it has a bug.

Comment From: yossigo

At a high level, authentication is about secrets, and secrets need to be periodically rotated.

Secrets are rotated to protect their use, but once a session is established, the secret is no longer used, so there's no value in rotating it. If we're concerned about session hijacking, TLS is the solution. If we're concerned about stale sessions being abused, there's timeout.

My argument is this adds no new security value and will negatively impact clients with connection pools by introducing unexpected reconnects and, as a result, higher latency.

Comment From: PingXie

Sorry for the late response.

regarding password rotation, i think what we would wanna do is disconnect the client when the password it was using is no longer valid, i.e. https://github.com/redis/redis/issues/11986. i don't see a point to randomly disconnect or de-authenticate the client after some time when the password it's using is still valid, and require a re-connection or re-authenticate with the same password.

I agree that de-auth/re-auth is not compatible with the current client expectation (and I have updated my PR to always drop the connection). Also to clarify, the scenario that I have in mind always involves password rotation. Basically, there is an org policy that states, after certain time, the password used to authenticate a Redis client needs to be changed and the existing connections need to be dropped (yes, to reduce the security risk in case they get compromised). Note that in our case the auth is done outside the core engine using a module(similar to #11659). Hence it is different than what #11986 is trying to address.

If we're concerned about stale sessions being abused, there's timeout.

I think timeout is close except that it track a different state, the last interaction, which means that as long as the connection is being used, it will not get dropped at all. A compromised connection can stay alive indefinitely this way. This is the reason why I proposed to track the life time of a connection, irrespective of the activity on it.

My argument is this adds no new security value and will negatively impact clients with connection pools by introducing unexpected reconnects and, as a result, higher latency.

I view this as a policy decision that each Redis user should be free to make. The default setting could be no auth-timeout at all but if the user has a security policy demanding the drop of existing connections after rotating the secret (which occurs periodically), I think Redis should allow/support it too.

Comment From: oranagra

@PingXie since my previous comment here, the aim in #11986 was changed not to drop clients when their password changes, and instead leave that to be done with CLIENT KILL. You did say in your last comment that the org policy is to rotate the password and drop clients, so isn't that enough.

Since you say you changed your PR to drop the connection instead of de-auth, then the client age (shown in CLIENT LIST) is probably sufficient, maybe all we need is an extra AGE option to CLIENT KILL?

Comment From: madolson

Since you say you changed your PR to drop the connection instead of de-auth, then the client age (shown in CLIENT LIST) is probably sufficient, maybe all we need is an extra AGE option to CLIENT KILL?

We've heard this request, having an "age" option to the client kill command. I agree that this is the right solution to the problem outlined. I'm also aware that there are some enterprise users who like weird things, but I don't think it's our responsibility to cover every edge case. A module could also implement this logic, so maybe if someone wants an enterprise security module they could built that there.

Comment From: PingXie

but I don't think it's our responsibility to cover every edge case.

I don't think this is an edge case for any non-trivial use of authentication. Implementing this in a module is one option, but I see this as a common pattern that could be elevated out of the module and supported in a more consistent/neutral way. Imagine the difference between CLIENT AGE KILL and module_foo.kill_a_client_that_is_out_of_policy age .... This is not to say that my first choice is CLIENT AGE KILL but between the two I prefer a "platform" solution that provides a high level consistency for a common problem.

maybe all we need is an extra AGE option to CLIENT KILL?

I'm looking at this from an end-user's perspective. I imagine that I already have a busy life writing complex business logic in my applications under tight deadlines. It would be really helpful if I could offload security compliance-related requirements to the service I depend on. It is a lot easier for me to express that my connections can't stay alive for 12 hours (a checkbox for me if I can set some Redis config). But it will be a lot more work if I have to track the clients in my applications and remember calling CLIENT KILL or CLIENT AGE KILL. Even worse, dealing with the possibility of these calls failing and my applications not being compliant as a result. Thoughts?

Comment From: oranagra

we discussed this again in a core-team meeting. we agreed that we don't think that feature belongs in redis, and that instead we should have a CLIENT KILL AGE (possibly being able to do intersection with other rules)

Comment From: slavak

Implemented CLIENT KILL AGE in PR #12299.