Spring Security Add option to prefetch jwks before first request, and refresh it in background

Expected Behavior

JWKS fetching should not have influence on response times on resource server. It should be fetched immediately on start and refreshed in background without affecting request (exception being when cached jwks has no matching key).

Current Behavior

JWKS fetch (for jwks-uri) is only done after it is needed (Nimbus[Reactive]JwtDecoder). This causes first request(s) to be delayed. Also when jwks expires it will block all request until new is fetched.

Context This is similar (but different) to #9560. While I agree that for WebClient this may be considered application-specific and left out of framework (though definitely would welcome it as option), I think for server side case is much stronger, as server performance and delays are more impactful, and this is causing intermittent delays for all servers configured with jwks-uri.

Envoy proxy has same issue and they reviewed it favorably as valid requirement for production environment: https://github.com/envoyproxy/envoy/issues/14556, https://github.com/envoyproxy/envoy/issues/14557

I think it could be reasonable default to have it enabled (with opt-out API), but opt-it API would be also ok.

There is also possibility to tie health check with jwks status, but that is separate topic maybe.

Comment From: piotrplazienski

/cc @jzheaux as requested.

Comment From: jzheaux

Thanks for the suggestion, @piotrplazienski.

NimbusJwtDecoder allows for a Cache to be specified, have you already tried using it? The nice thing about using a Cache is it gives you access to all of the Spring Cache building blocks.

Then, you can do:

JwtDecoder decoder = NimbusJwtDecoder.withJwkSetUri(uri).cache(cache).build();

Or, if you are more comfortable with the native Nimbus API, you can do:

JwtDecoder decoder = NimbusJwtDecoder.withJwkSetUri(uri)
    .jwtProcessorCustomizer((processor) -> {
        JWSVerificationKeySelector<SecurityContext> selector =
            new JWSVerificationKeySelector<>(jwsAlgs, jwkSource);
    processor.setJWSKeySelector(selector);
    return processor;
    });

where jwkSource is any custom JWKSource that you have created.

While the reactive side does not support Spring Cache, you are able to use NimbusReactiveJwtDecoder.withJwkSource, as I believe you stated in #9560.

Changing the decoders to do a prefetch would also change their start-up expectations, which wouldn't be ideal. Doing a prefetch would now require the authorization server to be up for decoder construction to complete.

That said, JwtDecoders and ReactiveJwtDecoders already require the authorization server to be up. So, doing a pre-fetch in JwtDecoders and ReactiveJwtDecoders may be reasonable. In fact, they are already doing that somewhat when they retrieve the JWK set for the signature algorithms. With a slight adjustment, I imagine that they could both share their cache with the decoder they are constructing.

Would you be interested in submitting a PR to update JwtDecoders and ReactiveJwtDecoders to do a pre-fetch?

As for doing things in the background, my concern would be providing the appropriate configuration hooks to the resulting API without being duplicative with other parts of the Spring Framework. Additionally, it would add a level of complexity to Spring Security that doesn't already exist and that other parts of the Spring Framework are likely a better fit for. For these reasons, I'd recommend leaving it out of Spring Security.

Comment From: piotrplazienski

Thank you for the answers.

NimbusJwtDecoder allows for a Cache to be specified, have you already tried using it? The nice thing about using a Cache is it gives you access to all of the Spring Cache building blocks.

So you are suggesting that I could prepare custom cache that would prefetch and refresh in background and inject it into decoder? That is interesting idea.

Changing the decoders to do a prefetch would also change their start-up expectations, which wouldn't be ideal. Doing a prefetch would now require the authorization server to be up for decoder construction to complete.

If this is concern, it can be done other way - construct immediately then trigger background refresh. If refresh finishes before first request, all good. Otherwise, block until it is done. Maybe also expose readiness. A bit complex maybe. That being said, If jwt is used first request will be blocked anyway, so I agree that blocking construction until jwks if fetched is reasonable. How to handle errors then? When auth server is not available? Block indefinitely and retry? Retry for some time then fail?

Would you be interested in submitting a PR to update JwtDecoders and ReactiveJwtDecoders to do a pre-fetch?

Absolutely, but there are two things: I need to power through my company approval process for OSS contribution, and it may take some time, and I would need some guidance, especially in terms of API. Should it be default behavior? Opt-in API? Can you point me to where prefetch is done for "retrieve the JWK set for the signature algorithms"?

As for background refresh - If possible, I would be happy to do it elsewhere. Any pointers? I believe this is very important for performant servers - otherwise unnecessary delay in response is introduced every time cache expires. I have not analysed this for non-reactive, but for reactive it would be reasonably easy - trigger ReactiveRemoteJWKSource::getJWKSet() periodically maybe using Flux.interval. Or maybe it would be best to make request to reactor - to have Mono.cache() with soft expiry, where cached value would be returned immediately and original publisher would be subscribed again to refresh in background?

Question would be if API would be necessary to stop it on demand. But I believe if this would be opt-in API like:

JwtDecoder decoder = NimbusReactiveJwtDecoder
    .withJwkSetUri(uri)
    .refreshPeriodically(Duration.ofMinutes(5))
    .build()

It might be ok to not expose API to stop it. For NimbusJwtDecoder it should be possible to do similar thing, probably using background thread. Do you think it might be ok?

Comment From: ttddyy

Hi @piotrplazienski

I faced a similar issue and I have implemented async JWKS retriever. (https://github.com/spring-projects/spring-security/issues/9728) You can check the implementation here.

With my implementation and if you are using Spring Boot, you can do the JWKS prefetch at application start with the following bean.

@Bean
public ApplicationRunner initializeAsyncResourceRetriever(AsyncResourceRetriever retriever) {
  return args -> {
    String jwksEndpoint = ...
    retriever.updateJwkSet(new URL(jwksEndpoint), false);  // false indicates synchronous call
  };
}

Since the bean is ApplicationRunner, it is guaranteed to run before readiness probe becomes ready.

For periodic background refresh, you can create a scheduled task(@Scheduled, scheduler, etc), injects the retriever, then simply call the retriever#updateJwkSet method.

The actual implementation to retrieve JWKS is by implementing ReactorJwkSetRetriever interface that needs to implement Mono<String> retrieve(URL url) method.

This can be as simple as the following with WebClient:

public class MyJwksRetriever implements ReactorJwkSetRetriever {

    private final WebClient webClient;

    public MyJwksRetriever(WebClient webClient) {
        this.webClient = webClient;
    }

    @Override
    public Mono<String> retrieve(URL url) {
        URI uri;
        try {
            uri = url.toURI();
        }
        catch (URISyntaxException ex) {
            throw new RuntimeException(ex);
        }
        return this.webClient.get().uri(uri).retrieve().bodyToMono(String.class);
    }

}

Since it uses reactor, you can even adds more sophisticated operators such as cache, retry, etc.

Comment From: piotrplazienski

Hi @ttddyy, thank you for sharing. This is more or less how I would imagine implementation of this, and we will implement something like this if we will be unable to get change done in spring security. What is the license of this code in case we wanted to reuse it? Are you ok with reusing?

But I think most important input is that this is generic and valid issue, not just my problem :). So maybe there is a chance to get this in spring-security as a feature.

Comment From: ttddyy

Hi @piotrplazienski,

Since I'd like to donate the code, I have updated the gist to include the copyright header with apache 2. Also, added a sample configuration class to specify this custom retriever to the JwtDecoder bean.

Ideally, we want to have this functionality placed in spring-security. So that anybody who faces this issue would get a benefit.

Comment From: skjolber

Hi guys, I filed those envoy issues, and my (blocking) example implementation was actually for Java with a Spring boot starter, so interesting to see this initiative.

So the biggest complaints I have had from developers (users), conserning this functionality, were * getting JWK keys fails once in 14 days and logs exceptions, which must be investigated and consumes resources! This was solved with
* retry-once on transient io exceptions, with low log level * server is not ready when load testing begins. This was solved with * health provider which refreshes JWKs when empty or in failed state * tied to readiness health group

I think the "before first request" goes hand in hand with the health check, and a simple retry should be trivial to implement, so I wonder if it would be possible to include those two features as well. :innocent:

Comment From: hypr2771

Hi, adding my two cents here (forgive my poor English).

I noticed that if the authorization server is stressed at the same time than the application, the application stresses it even more due to a lack of this feature.

Indeed, in an auto-scaling scenario triggered by heavy load, application is booting and directly taking traffic. Yet this fresh application needs to fetch the JWKS for the cache is not yet fetched. Therefore, every requests are trying to call the underlying authorization server to obtain it. Stressing it makes that authorization server even slower, creating a vicious circle with a bottle neck on the authorization server, exacerbated by a lack of eager synchronous JWKS loading.

This feature is a must-have in such scenarios.

Comment From: skjolber

FYI a lot of this functionality has been added to nimbus-jose-jwt, release is pending a few adjustments.

Comment From: chicobento

Hi, this is just a heads-up that nimbus-jose-jwt 9.28 has just been released with the improvements made by @skjolber (see changelog). Looking forward to see the improvements reach oauth2-resource-server, such as the one proposed by this issue and also support for health endpoints.

Comment From: skjolber

The added features are documented here. @jzheaux and friends:

which JWK Set features are actually desirable / still missing (regardless of implementation)?
how well do these match up against the new nimbus-jose-jwt improvements?
is having a corresponding reactive implementation (with feature parity) a must?