Production incidents because of invalid certificates are common issues in the industry. SslInfoContributor and SslHealthIndicator in this PR can help to mitigate them, they:
- Provide extra information about the used certificates (issuer, subject, validity, etc.)
- Indicate if a certificate is invalid and the reason of it
- Indicate if a certificate will be invalid within a configurable threshold (e.g.: in a week)
Example /info and /health outputs:
/info of a VALID cert (click here to expand)
```json
{
"ssl": {
"bundles": [
{
"name": "ssldemo",
"certificateChains": [
{
"alias": "spring-boot",
"certificates": [
{
"version": "V3",
"signatureAlgorithmName": "SHA256withRSA",
"validityStarts": "2024-06-21T21:17:02Z",
"validityEnds": "2024-07-05T21:17:02Z",
"validity": {
"status": "VALID"
},
"serialNumber": "9fbcacfed83af328",
"issuer": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
"subject": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
}
]
}
]
}
]
}
}
```
/health of a VALID cert (click here to expand)
```json
{
"status": "UP",
"components": {
"ping": {
"status": "UP"
},
"ssl": {
"status": "UP"
}
}
}
```
/info of an EXPIRED cert (click here to expand)
```json
{
"ssl": {
"bundles": [
{
"name": "ssldemo",
"certificateChains": [
{
"alias": "spring-boot-ssl-sample",
"certificates": [
{
"version": "V3",
"validity": {
"status": "EXPIRED",
"message": "Not valid after 2014-10-21T19:48:43Z"
},
"validityStarts": "2014-07-23T19:48:43Z",
"validityEnds": "2014-10-21T19:48:43Z",
"signatureAlgorithmName": "SHA256withRSA",
"serialNumber": "7207ee6e",
"issuer": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown",
"subject": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown"
}
]
}
]
}
]
}
}
```
/health of an EXPIRED cert (click here to expand)
```json
{
"status": "OUT_OF_SERVICE",
"components": {
"ping": {
"status": "UP"
},
"ssl": {
"status": "OUT_OF_SERVICE",
"details": {
"certificateChains": [
{
"alias": "spring-boot-ssl-sample",
"certificates": [
{
"version": "V3",
"validityEnds": "2014-10-21T19:48:43Z",
"validityStarts": "2014-07-23T19:48:43Z",
"signatureAlgorithmName": "SHA256withRSA",
"validity": {
"status": "EXPIRED",
"message": "Not valid after 2014-10-21T19:48:43Z"
},
"serialNumber": "7207ee6e",
"issuer": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown",
"subject": "CN=localhost,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown"
}
]
}
]
}
}
}
}
```
/info of a cert that WILL_EXPIRE_SOON (click here to expand)
```json
{
"ssl": {
"bundles": [
{
"name": "ssldemo",
"certificateChains": [
{
"alias": "spring-boot",
"certificates": [
{
"version": "V3",
"validityEnds": "2024-06-22T21:32:22Z",
"validityStarts": "2024-06-21T21:32:22Z",
"signatureAlgorithmName": "SHA256withRSA",
"validity": {
"status": "WILL_EXPIRE_SOON",
"message": "Certificate will expire within threshold (PT168H) at 2024-06-22T21:32:22Z"
},
"subject": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
"serialNumber": "64d019d1dd94eee0",
"issuer": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
}
]
}
]
}
]
}
}
```
/health of a cert that WILL_EXPIRE_SOON (click here to expand)
```json
{
"status": "UP",
"components": {
"ping": {
"status": "UP"
},
"ssl": {
"status": "WARNING",
"details": {
"certificateChains": [
{
"alias": "spring-boot-ssl-sample",
"certificates": [
{
"version": "V3",
"validityEnds": "2024-06-22T21:32:22Z",
"validityStarts": "2024-06-21T21:32:22Z",
"signatureAlgorithmName": "SHA256withRSA",
"validity": {
"status": "WILL_EXPIRE_SOON",
"message": "Certificate will expire within threshold (PT168H) at 2024-06-22T21:32:22Z"
},
"subject": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US",
"serialNumber": "64d019d1dd94eee0",
"issuer": "CN=localhost,OU=Spring,O=VMware,L=Palo Alto,ST=California,C=US"
}
]
}
]
}
}
}
}
```
If you want to play with it, start spring-boot-smoke-test-tomcat-ssl, the cert in resources/sample.jks is already EXPIRED, you can generate a VALID one via
keytool -genkeypair -storepass secret -keypass password -keystore sample.jks -storetype JKS -dname "CN=localhost, OU=Spring, O=VMware, L=Palo Alto, ST=California, C=US" -validity 14 -alias spring-boot -keyalg RSA -ext "SAN=DNS:localhost,IP:::1,IP:127.0.0.1"
or one that WILL_EXPIRE_SOON via:
keytool -genkeypair -storepass secret -keypass password -keystore sample.jks -storetype JKS -dname "CN=localhost, OU=Spring, O=VMware, L=Palo Alto, ST=California, C=US" -validity 1 -alias spring-boot -keyalg RSA -ext "SAN=DNS:localhost,IP:::1,IP:127.0.0.1"
Comment From: mhalbritter
Hey @jonatan-ivanov, that's quite a cool feature, thank you! The implementation looks good, too.
This works for all SSL bundles, and not only for the one used by the webserver, right?
Comment From: jonatan-ivanov
Great to hear! The main focus is the webserver but I think this should work for all SSL bundles (not tested yet) since using an expired cert can cause issues in every place they are used.
I can go ahead and work on the TODO items/docs/tests, in the meantime, can I get some feedback on two important items?
1. Can/should we introduce a new (common) health status: WARNING (returns 200 but indicates that something is not right)? Or should we keep the current custom status in the PR (WILL_EXPIRE_SOON_STATUS)?
2. Is calling WebServerSslBundle.get multiple times ok? See in the PR, in AbstractConfigurableWebServerFactory, and in NettyRSocketServerFactory or should there be just one instance (i.e.: a @Bean)?
Comment From: scottfrederick
Is calling
WebServerSslBundle.getmultiple times ok?
I think this is fine. However, I'm not sure we need to use WebServerSslBundle here.
That class adapts the discrete server.ssl.* properties to SslBundle design. If we ever decide to deprecate and remove the discrete server.ssl.* properties and only support bundles, then that class would go away.
We could just focus the InfoContributor on SSL bundles and document that you would need to convert from server.ssl.* properties to bundle properties to get the benefit of the InfoContributor.
Comment From: jonatan-ivanov
That class adapts the discrete
server.ssl.*properties toSslBundledesign. If we ever decide to deprecate and remove the discreteserver.ssl.*properties and only support bundles, then that class would go away.
ππΌ That's the exact same use-case I'm using WebServerSslBundle in this PR for: to be backwards compatible with the server.ssl.* properties. If only bundles will be supported and WebServerSslBundle will be removed in the future, the small block of code in SslInfo where WebServerSslBundle is used should also be removed since there won't be any properties to be backward compatible with.
We could just focus the InfoContributor on SSL bundles and document that you would need to convert from server.ssl.* properties to bundle properties to get the benefit of the InfoContributor.
That would simplify the changes in the PR a bit but also complicate the life of the users: simply enabling the feature might not do anything for them if they are not using bundles or it would work just half-way for them if they use bundles just not for the webserver. If it is not a big issue to use WebServerSslBundle, I would prefer supporting the server.ssl.* properties unless you have plans to deprecate them (and I guess eventually remove them).
Comment From: scottfrederick
Along with server.ssl.* (and management.server.ssl.*), we have spring.rsocket.server.ssl.*, spring.rabbitmq.ssl.*, and spring.kafka.*.ssl.* properties where users can currently choose to use discrete keystore and truststore properties or use a bundle. The web servers are the only place where we have an adapter like WebServerSslBundle that can be used to easily implement support for the discrete properties in the InfoContributor. We could choose to treat the discrete web server SSL properties specially as you've done by using the adapter (they are almost certainly the most commonly used SSL properties), or only support bundles so everything is treated equally and document that only bundles are supported. Let's see what others think about this.
In either case, I don't think it's necessary to make WebServerSslBundle a bean.
Comment From: jonatan-ivanov
Treating the discrete web server SSL properties specially somewhat makes sense to me since that's what WebServerSslBundle is also doing but having more "non-bundle" properties (I forgot about rabbitmq, kafka, and rsocket) drives me to the other direction: even if it would be nice to have backward-compatible support for the web server properties, it might make more sense to only support bundles and treat everything equally. :)
Comment From: jonatan-ivanov
I removed support for server.ssl.* properties (WebServerSslBundle) and added configprops support to set the warning threshold. Please let me know if this seems approximately ok and I can go ahed to write tests and docs.
Comment From: mhalbritter
We need to discuss if we want to have this warning status and if we do, to which http code to map it.
Comment From: jonatan-ivanov
My two cents about having a dedicated WARNING status:
- I imagine this status as something that means "Right now it's fine and not immediately actionable but if you don't do something, it might turn into an error sometime in the future".
- Within this context, I think the appropriate HTTP status code is 200 since at the time of the response everything is still successfully working and load balancers should not remove the instance out of rotation, letting traffic flowing is totally fine.
- Now if you ask me about the reason phrase... π I think there is some wiggle room there since in the HTTP status line, the reason phrase is optional and OK is "only" recommended (not mandated) for 200. So theoretically, 200 Warning can also be returned but 200 OK would describe a situation just fine: things are still OK. (But if you really want to make Spring Boot cool, my personal recommendation is 200 π§¨π§π€π€¨ππ¬π±. π)
Other than the feature introduced in this PR (the certificate will expire soon), I think there could be a few more use-cases where a WARNING status can be useful. For example for the disk space health indicator, when the disk is close to be full (within threshold), it's status can be WARNING. Another example can be updating local cache from an external resource periodically. If the update job fails the local cache might be stale. This could be fine for a while but can cause errors in the future.
Comment From: wilkinsona
We discussed this today and a couple of things came up:
- We're considering merging this without the custom status and then possibly adding that more broadly in a later milestone.
- It should be possible to see details of the certificates even when none of them have expired.
Comment From: jonatan-ivanov
What do you mean by the custom status? Do mean what is in the PR right now (WILL_EXPIRE_SOON) or the proposed WARNING status? If he former, what should we use instead (UP)?
You can see all of the certs on the info endpoint, do you also want the health endpoint to show all of them under a ~valid/not-valid key?
Comment From: wilkinsona
Sorry, I quickly wrote that comment in the meeting and it was obviously too terse.
What do you mean by the custom status? Do mean what is in the PR right now (
WILL_EXPIRE_SOON) or the proposedWARNINGstatus? If the former, what should we use instead (UP)?
We're not keen on an indicator-specific status such as WILL_EXPIRE_SOON but quite like that idea of a more general purpose WARNING status. We'd like to give ourselves a bit of time to confirm that it will indeed be general purpose.
In the meantime, I think UP would be the status to use. Moritz is going to take a look at this.
You can see all of the certs on the info endpoint, do you also want the health endpoint to show all of them under a ~valid/not-valid key?
We'd like it to show something as additional details (so they'll only appear when details are visible) but we're not yet 100% sure exactly what it should be. Moritz is going to take a look at this too.
Comment From: jonatan-ivanov
[..] but quite like that idea of a more general purpose
WARNINGstatus. We'd like to give ourselves a bit of time to confirm that it will indeed be general purpose.
Great to hear! Btw since this is targeted to a milestone release would it make sense to use new Status("WARNING") temporarily only in the health indicator without adding it to the Status class and see if we get any user feedback? Based on it (if any) we can introduce a general WARNING status in the Status class if it makes sense or change the health indicator to report UP before the RC. /cc @mhalbritter
@mhalbritter fyi: I modified the health indicator a bit so that now it returns the whole chain instead of the separate certs since if a cert is invalid in a chain, the whole chain is invalid. I think this also helps troubleshooting if there is an invalid cert (I updated the examples in the description).
Comment From: mhalbritter
I'd remove the warning bit completely for M2. Then we can take our time and come up with a WARNING status design, and if that's ready, we'll add the warning back to the ssl indicator (and possible to more health indicators). Does that sound good?
Comment From: jonatan-ivanov
Makes sense, I changed it to UP also rebased it on main and squashed the changes.
Comment From: mhalbritter
Thanks @jonatan-ivanov !
The health endpoint now always adds details about invalid and valid chains.