Rossen Stoyanchev opened SPR-17039 and commented
Historically UriComponents
has always encoded only characters that are illegal in a given part of the URI (e.g. "/" is illegal in a path segment), and that does not include characters that are legal but have some other reserved meaning (e.g. ";" in a path segment, or also "+" in a query param).
UriComponents
has also always relied on expanding URI variables first, and then encoding the expanded String, which makes it impossible to apply stricter encoding to URI variable values which is usually what's expected intuitively, because once expanded it's impossible to tell the values apart from the rest of the template. Typically the expectation is that expanded values will have by fully encoded.
While the RestTemplate and WebClient can be configured with a UriBuilderFactory
that supports different encoding mode strategy, currently there is really no answer when using UriComponents
directly.
Affects: 5.0.7
Issue Links: - #21565 HtmlUnitRequestBuilder decodes plus sign in query parameter ("is depended on by") - #22161 UriComponentsBuilder.toUriString() is broken - #21399 Spring is inconsistent in the encoding/decoding of URLs ("supersedes") - #20750 Encoding of URI Variables on RestTemplate ("supersedes") - #21259 UriComponentsBuilder does not encode "+" properly ("supersedes")
1 votes, 9 watchers
Comment From: spring-projects-issues
Rossen Stoyanchev commented
This is now ready, see the updated "URI Encoding" in the docs. The short version is, use the new UriComponentsBuilder#encode
method and not the existing one in UriComponents
, i.e. invoke encode before and not after expanding URI variables.
Please give this a try with 5.0.8 or 5.1 snapshots to confirm how it works in your application.
Comment From: spring-projects-issues
Christophe Levesque commented
Thanks Rossen Stoyanchev! The only downside is that it requires that extra UriComponentsBuilder#encode
call.
UriComponentBuilder.fromHttpUrl(url).queryParam("foo", foo).toUriString(); // <= this would still not work, needs to add new encode() after toUriString
Is there a way to change the toUriString
method in a way that would have the previous code work as is?
PS: Also, unrelated request: can there be a toUri()
shorthand method in UriComponentsBuilder
the same way there is a toUriString()
? :)
Comment From: spring-projects-issues
Rossen Stoyanchev commented
The key to understand this, is that different degrees of encoding are applied to the URI template vs URI variables. In other words given:
http://example.com/a/{b}/c?q={q}&p={p}
The URI template is everything except for the URI variable placeholders. However the code snippet you showed only builds a URI literal without any variables, so the level encoding is the same, only illegal characters, no matter which method is used.
So it would also have to be something like:
.queryParam("foo", "{foo}").buildAndExpand(foo)
By the time .toUriString()
is called, the expand would have happened and at that point it's too late to encode URI variables more strictly. Unfortunately we cannot switch the default mode of encoding in UriComponentsBuilder at this stage since that's the only way toUriString()
could work without a call to encode()
.
That said UriComponentsBuilder
does have a buildAndExpand shortcut to URI:
URI uri = UriComponentBuilder.fromHttpUrl(url).queryParam("foo", "{foo}").encode().buildAndExpand("a+b");
Comment From: spring-projects-issues
Rossen Stoyanchev commented
Come to think of it, this works and it's almost identical length as what you had:
URI uri = UriComponentsBuilder
.fromHttpUrl(url).queryParam("foo", "{foo}").build(foo);
Or include it in the URI template:
URI uri = UriComponentsBuilder
.fromHttpUrl(url + "?foo={foo}").build(foo);
Explanation: the build(Object...)
and build(Map<String, Object>)
methods had to be implemented as part of UriBuilder
(new in 5.0) but that contract is more likely to be used through DefaultUriBuilderFactory
, if at all, and is overall quite new. So I've switched those methods internally to do encode().buildAndExpand().toUri()
.
For toUriString()
a switch from build() + encode()
to encode() + build()
would make no difference, because no URI variables are expanded. We could add a toUri()
as well but that would also have no effect on encoding, i.e. same as toUriString()
.
Comment From: spring-projects-issues
Rossen Stoyanchev commented
Resolving but now is a good time to try this.
Comment From: spring-projects-issues
Michal Domagala commented
I use autoconfigured WebTestClient
in my integration test. I was satisfied with EncodingMode.URI_COMPONENT
because I could easy test trimming spaces in request argument - I can just add a space to my request .queryParam("foo", " bar"))
and verify space is trimmed on server side.
Could you point me how to elegant undo #21577 for autoconfigured WebTestClient
? The only way I found is drop autoconfigured one and create custom client.
Comment From: spring-projects-issues
Rossen Stoyanchev commented
Michal Domagala, I've create a ticket in Boot. I've also included at least one example there of how it can be done. There may be better ways though so watch for updates on the ticket.
Comment From: spring-projects-issues
Leonard Brünings commented
Rossen Stoyanchev please advice on how to solve it with generic UriTemplate
// templates are populated via Spring Boot Configuration Properties
private Map<String, UriTemplate> templates = new HashMap<>();
public URI getLink(String linkType, Map<String, String> templateParams) {
return templates.get(linkType).expand(templateParams);
}
conf:
links:
configSite: "https://example.com/config?email={email}"
At this point we don't know what variable we have, and if they are query or path variables.
URI redirectUri = getLink("configSite", Collections.singletonMap("email", "service+bar@gmail.com"));
// render https://example.com/config?email=service+bar@gmail.com
// instead of https://example.com/config?email=service%2Bbar%40gmail.com
This will provide a valid URI, however it won't encode the +
and @
, and then the plus will get decoded to a space on the receiving site. The problem is that we can't even use URLEncoder.encode
manually on the calling site, as this will cause double encoding.
Comment From: spring-projects-issues
Leonard Brünings commented
I managed to get the desired result with:
public URI getLink(String linkType, Map<String, String> templateParams) {
return UriComponentsBuilder.fromUriString(templates.get(linkType).toString())
.encode()
.buildAndExpand(templateParams).toUri();
}
However, I must say this is quite ugly. Is there any way we could get UriTemplate.encode().expand()
?
Comment From: spring-projects-issues
Rossen Stoyanchev commented
Not really, UriTemplate uses UriComponentsBuilder internally, and provides just one way of doing it. You could shorten your example to:
public URI getLink(String linkType, Map<String, String> templateParams) {
return UriComponentsBuilder.fromUriString(templates.get(linkType).toString())
.build(templateParams);
}
Comment From: HermanBovens
I ended up here after doing quite a bit of searching when investigating a bug that only appeared when users had entered a string with a +
in it, which was saved into the DB and later used as a URI parameter. Although in the end it's possible to have this encoded properly, I don't agree that the current solution is developer friendly. At the very least, the javadoc is lacking: javadoc for UriComponentsBuilder#queryParam
does not say that certain legal but special characters, such as +
, will not be encoded.
I believe that a lot of the confusion around this issue is related to the "String obsession" anti-pattern. If we have separate types for URI components with different levels of encoding applied, that might help to make conversion from one form to another more explicit.
Comment From: rstoyanchev
UriComponentsBuilder does have different URI components. The issue here is that there are legitimate reasons to want to encode +
or to leave it as is, and the fact that there was a change of behavior after a long time.
We did revamp the section on URI encoding but adding something specifically about the treatment of +
in the Javadoc of UriComponentsBuilder is a good idea. I will do so.
Comment From: rstoyanchev
adding something specifically about the treatment of + in the Javadoc of UriComponentsBuilder is a good idea. I will do so
https://github.com/spring-projects/spring-framework/commit/261956fd08fec9f35abbd2a1667e4acbbacdfc31
Comment From: HermanBovens
Thanks for the docs update.
I wonder if there is actually a use case for the behavior of queryParam() where the value is assumed to be a String that may be a combination of illegal characters, which must be encoded, and + characters with special meaning that must be preserved.
Maybe IDEs will be able to issue a warning when not using template variables with queryParam(), because it leads to bugs that only occur with values that happen to contain a +, which may not be the case during testing.
Op vr 20 dec. 2019 12:05 schreef Rossen Stoyanchev <notifications@github.com
:
adding something specifically about the treatment of + in the Javadoc of UriComponentsBuilder is a good idea. I will do so
261956f https://github.com/spring-projects/spring-framework/commit/261956fd08fec9f35abbd2a1667e4acbbacdfc31
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spring-projects/spring-framework/issues/21577?email_source=notifications&email_token=AAOVHA444POWXLDEZQZEJ3TQZSRF7A5CNFSM4J4MJQ5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHMUMRY#issuecomment-567887431, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOVHAYS7E7TEIIUNCXNT3DQZSRF7ANCNFSM4J4MJQ5A .
Comment From: ravenblackdusk
Did I understand this properly? Is the simplest way to encode "=" in query params something like this:
UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "{1}").queryParam("bar", "{2}").build(mapOf(
"1" to foo,
"2" to bar
))
Comment From: rstoyanchev
No, this works too:
UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "=").build().encode().toUri()
Comment From: HermanBovens
What if you need to add a parameter that can have multiple values, for example using a MultiValueMap<String, String>
?
Will this properly encode everything (including +
of course) ?
MultiValueMap<String, String> map = ...
...
webClient.get(builder -> builder.path(...)
.queryParams(map)
.build())
If not then what is the suggested way to get this functionality?
Comment From: HermanBovens
It seems the above does not fully encode, it's the same as using .queryParam(...)
without template variable.
I continue to find bugs as a result of the current API, and expect to be doing so for a long time. I've consulted RFC 3986, and although I did not see a section that mandates such a confusing API, I did find several passages that indicate the opposite.
In Section 2.2 "Reserved Characters":
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.
--> I think the parameters after the first one in the queryParam
method qualify as data for a URI component
. Apparently it was decided that only template variable contents qualify, but the spec certainly doesn't dictate that.
Thus, characters in the reserved
set are protected from normalization and are therefore safe to be
used by scheme-specific and producer-specific algorithms for
delimiting data subcomponents within a URI.
--> So yes characters like '+' should be preserved when used as a delimiter, not otherwise.
Section 2.4 "When to Encode or Decode":
Under normal circumstances, the only time when octets within a URI
are percent-encoded is during the process of producing the URI from
its component parts. This is when an implementation determines which
of the reserved characters are to be used as subcomponent delimiters
and which can be safely used as data.
--> This decision is important, and always treating reserved characters in query parameter data as delimiters (except when used with template variables) does not seem right, especially when there is no other option than to use template variables.
When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, as otherwise the data may be
mistaken for component delimiters.
--> This is the reverse process and warns again about the possibility to mix up data and delimiters.
So, it's all about API design, the spec doesn't say +
characters should not be encoded, on the contrary.
As I see it, there are only 2 kind of Strings that could be supplied as values for the queryParam()
method:
-
Plain Strings that need to be fully encoded because they must be reconstructed when dereferencing the URI and may contain illegal or reserved characters (e.g. user input).
-
Strings that contain delimiters and percent-encoded data. For these, no encoding should be done at all (otherwise it would be double-encoded).
A query parameter string that would consist of plain strings (non-encoded user input) which are joined with '+' characters to be transmitted as delimiters, should not exist. It could result in a String that contains some '+' characters that need to be percent-encoded and others that do not. However, it is precisely this kind of String that is assumed to be passed into queryParam()
, as illegal characters will be encoded but '+' will not.
Since parameters values to queryParam
are of type Object
, maybe a type OpaqueString
(constructable using a static factory method opaque(String)
) could be provided, whose contents are considered to be of the 1ste type and need full encoding if encoding is enabled at all. This would also solve the case where the parameter contains {
and }
characters which must be interpreted as data and not template variable pre- and postfixes. When well-documented, I think this is preferable to template variables whose values are specified at the end of the build chain and must be in the exact order.
Comment From: ravenblackdusk
No, this works too:
UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "=").build().encode().toUri()
@rstoyanchev this works for '=' but not for '/' or '+'. so is the following the simplest way to encode '/', '+' and similar characters?
UriComponentsBuilder.fromHttpUrl("http://localhost").queryParam("foo", "{1}").queryParam("bar", "{2}").build(mapOf(
"1" to foo,
"2" to bar
))
Comment From: HermanBovens
@ravenblackdusk It seems so. Maybe the simplest solution would be to add a rawQueryParam()
method where the parameters are assumed to not contain any template variables and must be fully encoded (including special characters, including +
).