Affects: Spring Web 4.3.9 / 5.2.8

UriComponentsBuilder/UriComponents doesn't support encoding of ";".

For example it might be a value like:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36

but ";" is not encoded:

UriComponentsBuilder builder = UriComponentsBuilder.fromUriString("http://domain.net/q?param0=aaa");

// param1 is single-value parameter with a complex value
builder.queryParam("param1", "bb;%& b");

// param2 is multi-value parameter with a complex value
builder.queryParam("param2", "cc;%& c");
builder.queryParam("param2", "dd;%& d");

UriComponents components = builder.build(false).encode();

System.out.println(components.getQuery());
// param0=aaa&param1=bb;%25%26%20b&param2=cc;%25%26%20c&param2=dd;%25%26%20d

org.springframework.web.util.UriUtils#encodeQueryParam works in the same way.

System.out.println(UriUtils.encodeQueryParam("bb;%& b", StandardCharsets.UTF_8));
// bb;%25%26%20b

Unfortunately W3 recommends to use ";" as "&" which makes this URL unusable with the big number of web servers.

https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.

Python can't process such a query string

$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import parse_qs
>>> parse_qs("param0=aaa&param1=bb;%25%26%20b&param2=cc;%25%26%20c&param2=dd;%25%26%20d")
{'param0': ['aaa'], 'param1': ['bb'], 'param2': ['cc', 'dd']}

Comment From: mazurkin

May be it worth to override encode with the extended version like encodeStrict. Please also check what GNU/curl does with encoding when --data-urlencode is specified.

Comment From: Birthright50

The problem is not only with this character, but also with + and :. For example:

        String baseUrl = "http://item-service/api/external/items";
        final URI uri = UriComponentsBuilder.fromHttpUrl(baseUrl)
                .queryParam("from", OffsetDateTime.now().toString())
                .encode().build().toUri();
        System.out.println(uri);

output: http://item-service/api/external/items?from=2021-08-30T21:20:38.669121955+03:00 But expected: http://item-service/api/external/items?from=2021-08-30T21%3A20%3A38.669121955%2B03%3A00

And there is no solution to this issue, the documentation does not say anything.

Comment From: rstoyanchev

@Birthright50 that documentation link specifically discusses characters like ";" which are legal and what to do if you want to have them encoded to suppress that meaning. Please have another look at that section and the given examples.

Comment From: Birthright50

@Birthright50 that documentation link specifically discusses characters like ";" which are legal and what to do if you want to have them encoded to suppress that meaning. Please have another look at that section and the given examples.

According to this documentation, I should write something like the following code:

String baseUrl = "http://item-service/api/external/items";
final LinkedMultiValueMap<String, String> queryParams = new LinkedMultiValueMap<>(Map.of("from", List.of(OffsetDateTime.now().toString())));
final UriComponentsBuilder uriComponentsBuilder = UriComponentsBuilder.fromHttpUrl(baseUrl);
queryParams.forEach((key, val) -> uriComponentsBuilder.queryParam(key, "{" + key + "}"));
final URI uri = uriComponentsBuilder
            .encode()
            .buildAndExpand(queryParams.values())
            .toUri();
System.out.println(uri);

Honestly, the solution is so-so, why not do the encodeStrict function as the topicstarter suggested?

Comment From: rstoyanchev

@Birthright50, for the URI template (the literal parts) we assume that characters there are used according to their meaning. For example a ";" in a path segment to separate path variables. We only apply strict encoding to dynamic parts (URI variables).

I agree the above code sample is not ideal. There is a method on UriUtils that helps to encode a map variables, so you could do this:

Map<String, String> queryParams = UriUtils.encodeUriVariables(Map.of("from", OffsetDateTime.now().toString()));

UriComponentsBuilder uriComponentsBuilder = UriComponentsBuilder.fromHttpUrl("http://item-service/api/external/items");
queryParams.forEach(uriComponentsBuilder::queryParam);
URI uri = uriComponentsBuilder.build(true).toUri();

System.out.println(uri);

Comment From: mazurkin

Ok, long story short: never ever use builder.queryParam, instead use builder.uriVariables()

But I see another bummer (Spring 5.3.24):

UriComponentsBuilder builder = UriComponentsBuilder
    .fromUriString("http://domain.net/q?param0=aaa&param4={token}");

builder.uriVariables(Map.of("token", "ee;%& e"));

// leads to java.lang.IllegalArgumentException: Invalid character '{' for QUERY_PARAM in "{token}"
UriComponents components = builder
    .encode()     // I encoded the builder, didn't I?
    .build(true); // So here I set `encoded=true`

// only with `encoded=false` it finally produces the expected string
//      http://domain.net/q?param0=aaa&param4=ee%3B%25%26%20e
System.out.println(components.toUriString());

Comment From: rstoyanchev

The encode on UriComponentsBuilder (as the Javadoc says) encodes the template only, with URI variables to be encoded when expanded. This separation allows respecting reserved characters in the URI template, while using a stricter encoding for URI variable values.