I believe i found an issue with UriUtils.decode(String source, Charset charset)
.
When the source
string contains the character ’
(Right Single Quotation Mark - unicode 2019), and other URI characters (to trigger the rewrite of the string), the character is changed to (End of Medium - unicode 0019).
Here is a sample test in Kotlin that highlights the behaviour:
@Test
fun test(){
val c = '\u2019'
val s = "%20$c"
val expected = " $c"
val d = UriUtils.decode(s, Charsets.UTF_8)
assertThat(d).isEqualTo(expected)
}
Here is the difference shown from the failing test:
" ’"
" "
I am using Spring 6.2.0.
Comment From: sdeleuze
I suspect an inconsistency between the provided UTF-16 hexadecimal value and the UTF-8 charset specified, see https://www.fileformat.info/info/unicode/char/2019/index.htm.
Also if you check the Javadoc of the underlying StringUtils#uriDecode
method, if am not sure what you try to do is supported ("For all other characters (including those already decoded), the output is undefined").
Any thoughts?
Comment From: nosan
I expect StringUtils.uriDecode
to behave similarly to URLDecoder
, except that the +
sign is treated as a literal +
instead of a space (' ').
@Test
void uriDecode() {
assertThat(URLDecoder.decode("%20\u2019", StandardCharsets.UTF_8)).isEqualTo(" ’"); // success
assertThat(URLDecoder.decode("\u2019", StandardCharsets.UTF_8)).isEqualTo("’"); // success
assertThat(StringUtils.uriDecode("\u2019", StandardCharsets.UTF_8)).isEqualTo("’"); // success
assertThat(StringUtils.uriDecode("%20\u2019", StandardCharsets.UTF_8)).isEqualTo(" ’"); // fail
}