Spring UriUtils.decode alters unicode character

I believe i found an issue with UriUtils.decode(String source, Charset charset).

When the source string contains the character ’ (Right Single Quotation Mark - unicode 2019), and other URI characters (to trigger the rewrite of the string), the character is changed to (End of Medium - unicode 0019).

Here is a sample test in Kotlin that highlights the behaviour:

@Test
  fun test(){
    val c = '\u2019'
    val s = "%20$c"
    val expected = " $c"

    val d = UriUtils.decode(s, Charsets.UTF_8)

    assertThat(d).isEqualTo(expected)
  }

Here is the difference shown from the failing test:

" ’"
" "

I am using Spring 6.2.0.

Comment From: sdeleuze

I suspect an inconsistency between the provided UTF-16 hexadecimal value and the UTF-8 charset specified, see https://www.fileformat.info/info/unicode/char/2019/index.htm.

Also if you check the Javadoc of the underlying StringUtils#uriDecode method, if am not sure what you try to do is supported ("For all other characters (including those already decoded), the output is undefined").

Any thoughts?

Comment From: nosan

I expect StringUtils.uriDecode to behave similarly to URLDecoder, except that the + sign is treated as a literal + instead of a space (' ').

@Test
void uriDecode() {
    assertThat(URLDecoder.decode("%20\u2019", StandardCharsets.UTF_8)).isEqualTo(" ’"); // success
    assertThat(URLDecoder.decode("\u2019", StandardCharsets.UTF_8)).isEqualTo("’"); // success
    assertThat(StringUtils.uriDecode("\u2019", StandardCharsets.UTF_8)).isEqualTo("’"); // success
    assertThat(StringUtils.uriDecode("%20\u2019", StandardCharsets.UTF_8)).isEqualTo(" ’"); // fail
}