Problem

I'm using Spring Boot 2.2.9. The change introduced by https://github.com/spring-projects/spring-boot/pull/19999 considers any keys contains "uri", "uris", "address" or "addresses" are "comma separated URLs". This is not always the right assumption. It will try to remove password from those URLs, however if it's not a URL format, it will return the original content.

Expected Behavior

If that key is not a URL format, it should return as ******. Or at least allow developers to configure whether they want sanitize URLs or not.

Reproducer

Sanitizer sanitizer = new Sanitizer();
System.out.println(sanitizer.sanitize("uris", "[amqp://foo:bar@host/]"));
System.out.println(sanitizer.sanitize("uris", "amqp://foo:bar@host/"));

The output is:

[amqp://foo:bar@host/]
amqp://foo:******@host/

Comment From: bclozel

Hi, this is a first-timers-only issue. This means we've worked to make it more legible to folks who either haven't contributed to our codebase before, or even folks who haven't contributed to open source before.

If that's you, we're interested in helping you take the first step and can answer questions and help you out as you do. Note that we're especially interested in contributions from people from groups underrepresented in free and open source software!

If you have contributed before, consider leaving this one for someone new, and looking through our general ideal-for-contribution issues. Thanks!

Problem

The Actuator org.springframework.boot.actuate.endpoint.Sanitizer takes care of sanitizing sensitive information when exposing information on the env and configprops Actuator endpoints.

When the configuration key matches one of the configured pattern, the entire configuration value is sanitized. There is a specific case for URI-like keys ("uri", "uris", "address", "addresses"), where the Sanitizer tries to sanitize only the password part of URLs:

something.uri=https://emily:springboot@spring.io/ -> something.uri=https://emily:******@spring.io/
other.uri=https://spring.io/ -> other.uri=https://spring.io/

The first problem is about documentation. The docs imply that all configuration keys are processed the same way.

If any of the keys to sanitize are URI format (i.e. ://:@:/), only the password part is sanitized.

But in fact user info keys ("uri", "uris", "address", "addresses") and others are treated differently:

  • user info keys ("uri", "uris", "address", "addresses"): only hide the password part in URLs, or return the value unchanged
  • other keys ("password", "secret", "key", "token", "vcap_services", "sun.java.command","credentials") are completely sanitized

The second problem is that the sanitizer is used against serialized configuration values - whenever a List is serialized as JSON, it is serialized as [https://emily:springboot@spring.io/,https://spring.io] and not https://emily:springboot@spring.io/,https://spring.io. The problem is that if the current URI_USERINFO_PATTERN doesn't support this case and the original value is returned, nothing is sanitized.

Solution

First, the reference documentation (located in spring-boot-project/spring-boot-docs/src/docs/asciidoc/howto.adoc) should be improved to better explain the way keys are processed (some entirely sanitized in all cases, others only the password part if present).

Also, the URI_USERINFO_PATTERN pattern in the Sanitizer should be improved to support the list case. Maybe something like "\\[?[A-Za-z]+://.+:(.*)@.+$" would work? All tests in the SanitizerTests should be still green after that change, and we should have a new test case checking that:

"[http://user1:password1@localhost:8080,http://user2@localhost:8082,http://localhost:8083]" -> "[http://user1:******@localhost:8080,http://user2@localhost:8082,http://localhost:8083]";

If possible, the PR should be done against the 2.2.x branch - if master is more convenient for the contributor, project maintainers will rebase the PR.

Steps to Fix

  • [x] Claim this issue with a comment below and ask any clarifying questions you need
  • [x] Set up a repository locally following the Contributing Guidelines
  • [ ] Try to fix the issue following the steps above
  • [ ] Commit your changes and start a pull request.

Comment From: helloworldless

This looks reasonable for my first contribution! Taking a look...

Comment From: Davio

In this case, the part inside the square brackets is still a normal URL which would be properly sanitized if parsed without the square brackets. Wouldn't it make more sense to make the regex more flexible so it can detect embedded URLs?

For instance, changing it to .*[A-Za-z]+:/.+:(.*)@.+ ?

Comment From: helloworldless

That solution definitely makes sense as well. I just implemented the fix in this way per @bclozel's suggestion which I hope I've understood correctly:

sanitize the password section only if the value matches the user info pattern, or completely sanitize the value and return **

I was able to justify this solution to myself as follows...

There are probably other patterns out there too, like the URI surrounded by curly braces instead of square brackets, or some other format we can't even imagine. If we accept that it's unlikely that we would ever be able to account for every possible pattern, we can make things simple, and just sanitize anything that doesn't match the most common pattern. That will undoubtedly cause us to sanitize values which do not contain passwords, but it also will prevent us from accidentally exposing a password which is contained in an unhandled pattern.

Comment From: Davio

I certainly understand the route you have chosen. It's better to err on the side of caution especially when security is concerned. Do you know the purpose behind the square brackets for AMQP addresses?

We also use AMQP and specify the username and password separately so it is not embedded in the URL.

Comment From: bclozel

@helloworldless Sorry my first comment was wrong - we've discussed that during a team call and this issue was more subtle than we thought. I've updated the issue description and the steps to follow. Hopefully now everything is clear.

Given the unfortunate back and forth on this issue, feel free to request help or to leave out part of the work - we'll happily complete it before merging.

Thanks!

Comment From: helloworldless

No worries. I'll take another look!

Comment From: helloworldless

After a bit of experimentation, it looks like it's going to be very tricky to come up with a single regex pattern that works for lists and individual values. So what if we handle an individual value and a list differently? An individual value would be handled basically like it is currently, and a list (identified by surrounding square brackets) would be handled something like this:

  1. Strip the square brackets
  2. Split the elements on commas
  3. Apply regex and replacement to each element
  4. Rejoin the elements with commas and surrounding square brackets

Comment From: bclozel

@helloworldless wouldn't the regexp proposed in the issue description (i.e. "\\[?[A-Za-z]+://.+:(.*)@.+$") work in this case? If I'm not mistaken we already split the value on commas and then match against the pattern. An optional [ in the pattern should cover our missing case?

Comment From: helloworldless

I spent some more time on this, and here's the pattern I came up with:

String proposedPattern = "(\\[?[A-Za-z]+?://.+?:)(.*?)(@.+?)";

I ended up needing the three capture groups to be able reconstruct the values. I also had to make the matchers lazy ("ungreedy").

I've demonstrated this here: https://repl.it/@helloworldless/spring-boot-issue-23037. This is also useful to see the regex dissected: https://regex101.com/r/sr6uFN/1.

Shall I proceed with this?

Comment From: bclozel

I'm probably missing something here. We don't run the pattern on the whole string, since we're splitting on "," in the first place.

I've locally changed the pattern to Pattern.compile("\\[?[A-Za-z]+://.+:(.*)@.+$"); and ran the following test successfully:

    @ParameterizedTest(name = "key = {0}")
    @MethodSource("matchingUriUserInfoKeys")
    void uriAsListWithPasswordShouldHaveThoseSanitized(String key) {
        Sanitizer sanitizer = new Sanitizer();
        assertThat(sanitizer.sanitize(key,
                "[http://user1:password1@localhost:8080,http://user2@localhost:8082,http://localhost:8083]")).isEqualTo(
                "[http://user1:******@localhost:8080,http://user2@localhost:8082,http://localhost:8083]");
    }

Am I missing a test case that would invalidate this approach?

Comment From: helloworldless

My bad. I had not properly investigated the code.

I have now used the pattern you suggested and updated the tests. I also modified Sanitizer#getPattern to handle indexed property keys like uris[1]. See diff here: https://github.com/spring-projects/spring-boot/compare/2.2.x...helloworldless:23037-sanitize-uris.

I do have one question about the change to Sanitizer#getPattern. My change alters the behavior for all sensitive keys, not just the URI/address ones. So for example, it will now match "my.password[0]", but before that would not have been a match. Is this acceptable or should I do a bit of refactoring to split out the sensitive keys (e.g. "token", "password") from the URI/address keys (e.g. "uri", "address"), so that they can be treated differently in Sanitizer#getPattern?

I also wanted to give a bit more detail based on my current understanding of the requirements to make sure we're on the same page...

There are actually two property list scenarios which are now handled: "Native" Properties List and User-Provided List Literals.

"Native" Properties List

uris[0]=http://user1:password1@localhost:8080
uris[1]=http://user2:password2@localhost:8082
uris:
  - http://user1:password1@localhost:8080
  - http://user2:password2@localhost:8082

Sanitizer#sanitize is called twice with these parameters:

  1. key = "uris[0]", value = "http://user1:password1@localhost:8080"
  2. key = "uris[1]", value = "http://user2:password2@localhost:8082"

Previously, this was not handled properly due to the trailing value + $ in this pattern in Sanitizer#getPattern:

Pattern.compile(".*" + value + "$", Pattern.CASE_INSENSITIVE)

To fix this I've modified that pattern to match an optional index enclosed in square brackets:

Pattern.compile(".*" + value + "(?:\\[.+])?$", Pattern.CASE_INSENSITIVE)

User-Provided List Literals

I'm not sure why this would be used, but I suppose some libraries may be expecting this format.

uris=[http://user1:password1@localhost:8080,http://user2:password2@localhost:8082]

Note: Quotes here are required or it's parsed as a "native" list

uris: "[http://user1:password1@localhost:8080,http://user2:password2@localhost:8082]"

For either of these, Sanitizer#sanitize is called once with these parameters:

  1. key = "uris", value = "[http://user1:password1@localhost:8080,http://user2:password2@localhost:8082]"

As you pointed out, Sanitizer#sanitizeUris splits on ",". So Sanitizer#sanitizeUri is called twice:

  1. "[http://user1:password1@localhost:8080"
    1. This is now matched as a URI because of the change to Sanitizer#URI_USERINFO_PATTERN
  2. "http://user2:password2@localhost:8082]"
    1. This was never a problem; it was already being matched by the existing Sanitizer#URI_USERINFO_PATTERN

Comment From: bclozel

Thanks for your first contribution @helloworldless - I took your commit and just simplified it a bit.

I think this first-timer issue was more subtle than expected, but the fix looks right. I ended up removing the more complex parts about matching array-like keys, because I think there was a confusion about that. The Sanitizer is only involved in the Actuator endpoints, when iterating on the parsed PropertySource instances; so the Sanitizer is never looking at the properties/yaml syntax, but rather the parsed representation of keys ("[http://user1:password1@localhost:8080,http://user2:password2@localhost:8082]" being the string representation of a list here).

This is now merged in 2.2.x and merged-forward in later branches.