Affects: 6.1.2
The class org.springframework.http.ContentDisposition should allow setting different encodings for the filename
and filename*
part. On MDN we learn that the filename
part is for compatibility with user agents that don't support "complex" encodings.
We have observed that when setting the encoding to UTF-8 the filenames are parsed correctly by browsers (likely from filename*
) but other clients, for example curl -OJ
don't decode the UTF-8 encoded filename
part and instead write a file with a name like =?UTF-8?Q?myFile.txt?=
which is not a nice filename
Example:
@Test
public void testContentDispositionUTF8() {
var disposition = ContentDisposition.builder("attachment")
.filename("myFile.txt", StandardCharsets.UTF_8)
.build();
assertEquals("attachment; filename=\"=?UTF-8?Q?myFile.txt?=\"; filename*=UTF-8''myFile.txt",
disposition.toString());
}
It would be great if it was possible to specify that filename*
should be encoded as UTF-8 and filename
should be encoded in a "ascii" safe way (for example discard all characters > 255)
When choosing the ASCII charset and using ContentDisposition
outside the ascii range Tomcat will error.
For example:
```java
response.setHeader(HttpHeaders.CONTENT_DISPOSITION, ContentDisposition.builder("attachment").filename("。.txt").build().toString());
Then Tomcat would eventually tell us
```java
java.lang.IllegalArgumentException: The Unicode character [。] at code point [12,290] cannot be encoded as it is outside the permitted range of 0 to 255
at org.apache.tomcat.util.buf.MessageBytes.toBytesSimple(MessageBytes.java:310) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
at org.apache.tomcat.util.buf.MessageBytes.toBytes(MessageBytes.java:283) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
at org.apache.coyote.http11.Http11OutputBuffer.write(Http11OutputBuffer.java:389) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
at org.apache.coyote.http11.Http11OutputBuffer.sendHeader(Http11OutputBuffer.java:368) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
Due to this issue it is not sufficient to allow filename
to be encoded as ASCII
while filename*
would be encoded as UTF-8
, instead we need a way to strip non-safe characters from filename
(or encode it in a better supported format?)
Comment From: deg-hrisser
Hey, I have also stumbled upon the problem - not all User Agents support the currently used encoding, which caused a great deal of confusion. In our case, it was a version of the Postman HTTP-Client.
It also explicitly states in Section 5 of RFC2047:
+ An 'encoded-word' MUST NOT be used in parameter of a MIME
Content-Type or Content-Disposition field, or in any structured
field body except within a 'comment' or 'phrase'.
In conjunction with RFC 6266, Appendix C.1, which reiterates:
RFC 2047 defines an encoding mechanism for header fields, but this
encoding is not supposed to be used for header field parameters --
see Section 5 of RFC2047.
...
In practice, some user agents implement the encoding, some do not
(exposing the encoded string to the user), and some get confused by
it.
It seems to me, that the current strategy does not honor these standards, although I fully admit I'm not very knowledgeable about these in particular.
I see this is still generally planned? Is there any workaround short of copying / modifying the Content-Disposition code into our own project?
Thanks for your work :)