In spring-web 5.3.13, UriComponentsBuilder.fromUri / UriComponentsBuilder.uri silently drops some RFC 2396 3.2.1 Registry-based Naming Authorities even when they are accepted as RFC 3986 3.2 authorities.

    // I tested java 17.0.1-zulu and 8.0.312-zulu - with the same results.

    @Test
    public void testUriComponentBuilder() throws URISyntaxException {
        // Generic scheme to avoid any assumptions around HTTP
        String scheme = "scheme";

        // Matches
        // - RFC 2396 3.2.1. "reg_name" Registry-based Naming Authority
        // - RFC 3986 3.2.2 "host" Host
        // NOT an RFC 2396 3.2.2. "server" Server-based Naming Authority
        String authority = "rfc_2396_authority";


        String uriString = scheme + ":" + "//" + authority ;

        URI testUri = new URI(uriString);

        UriComponents uriComponentsFromUriString = UriComponentsBuilder.fromUriString(uriString).build(true);
        UriComponents uriComponentsFromUri = UriComponentsBuilder.fromUri(testUri).build(true);

        // Demonstrate that the URI is well formed according to RFC 2396 and parses as expected
        assertAll("testUriFromString should have a RFC 2396 Registry-based Naming Authority (3.2.1.)",
                () -> assertEquals(authority, testUriFromString.getAuthority()),
                () -> assertNull(testUriFromString.getUserInfo()),
                () -> assertNull(testUriFromString.getHost()),
                () -> assertEquals(-1, testUriFromString.getPort())
        );

        // Demonstrate that the URI is parseable under RFC 3986 from string form
        assertAll("uriComponentsFromUriString should be populated ",
                () -> assertEquals(authority, uriComponentsFromUriString.getHost())
        );

        // Demonstrate that the parsed URI is equivalent (according to java.net.URI)
        assertEquals(testUriFromString, uriComponentsFromUriString.toUri());

        // Unexpectedly fails as getHost() return a null even though testUri has an RFC 3986 host
        assertAll("uriComponentsFromUri should have populated host ",
                () -> assertEquals(authority, uriComponentsFromUri.getHost())
        );

    }

When java.net.URI encounters an RFC 2396 3.2.1 authority (identified as any correct authority not parseable under RFC 2396 3.2.2), it returns null values for the server-based naming authority components, causing UriComponentBuilder to assume they are unset and silently dropping the data. However, this data is available via URI.getAuthority().

I would expect that UriComponentBuilder would attempt to reparse the authority from Uri.getAuthority under the RFC 3986 3.2 specification.

The fact that these classes are written against different versions of the spec is obviously a complicating factor. The most straightforward option is to reparse the URI from scratch under the RFC 3986 spec, though I think that it would be preferable to only parse the authority component to limit the blast radius of this change. I am happy to contribute a PR, but would appreciate thumbs up on the approach before investing the time as this would substantially change existing behavior if UriComponentBuilder.uri is being used after the authority components have been set via another method.

Comment From: rstoyanchev

I'm not quite sure what your are proposing or what you mean with re-parsing.

If understand, when the source is a URI, we could mirror the properties of java.net.URI, adding a new getAuthority method, and returning null from getHost. I'm less clear on what happens when parsing from a String. Our regex seems to pick up the given authority as the host, but the host in that case should probably be null.

Overall, If the current spec doesn't support the concept of a non-server authority, I'm not sure we need to support it UriComponentsBuilder.

Comment From: spring-projects-issues

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Comment From: fedorka

This is certainly complicated! Thank you for being patient while I try to explain my understanding.

  1. I believe that scheme://rfc_2396_authority is a valid RFC 3986 URI with host = "rfc_2396_authority" (which matches reg-name = *( unreserved / pct-encoded / sub-delims )).
  2. Given (1), I believe that UriComponentsBuilder::fromUriString handles this case correctly as it populates the host field of the builder (and ultimately the HierarchicalUriComponents).
  3. Also given (1), I believe that UriComponentsBuilder::uri handles this case incorrectly as it does not populate the host field.

My analysis is that the uri method assumes that RFC 2396 hosts are equivalent to RFC 3986 hosts, but that does not align with the RFCs. The RFC 2396 host (as defined in RFC 2396 3.2.2) is only valid in a subset of RFC 2396 authorities and so leaves the RFC 2396 3.2.1 case uncovered.

I attempted to demonstrate this through showing how fromUriString parses the URI, as that appears to be correctly aligned with RFC 3986 and I believes demonstrates the correct parsing of the URI under the newer spec.

As far as "re-parsing", I was simply suggesting a solution. If you agree that the current behavior is unexpected then I can write a PR and we can more directly discuss the possible solution.

adding a new getAuthority method This would be a useful (to me) enhancement to UriComponents and I'd be happy to contribute it, but I feel it is decoupled from the above behavior.

Comment From: rstoyanchev

It is surprising that URI#getHost returns null for what should be a valid host under RFC 3986 syntax, and that there is not a mention of it in the Javadoc. We could consider an improvement in UriComponentsBuilder#uri to catch this case and attempt to align with UriComponentsBuilder#fromUriString.

Closing this one but feel free to submit a PR and reference this issue.