Story

A user writes a MessageConverter implementation that they wish to use via SpringEncoder. This message converter produces binary payloads.

Attempts to do so, however, have the request treated as though it were in the UTF8 charset, regardless of the actual content. This causes the request to be encoded and interpreted incorrectly. This holds true even if the custom converter specifies a charset parameter in the request content type(s) that it supports!

Steps to Reproduce

  1. Create a new MessageConverter that is not derived from ByteArrayMessageConverter or ProtobufHttpMessageConverter. The converter should produce a binary request (non-text contents) with a suitable request content type, such as application/octet-stream.
  2. Configure a SpringEncoder with this message converter.
  3. Attempt to send a request with a body consisting of content converted by this message converter.
  4. Observe that the request is treated as being in UTF8 encoding.

Cause

The logic in question begins on line 128 of SpringEncoder.

Proposed Solution

Two potential solutions (not mutually exclusive) would be:

  1. Honor any request content types set by the MessageConverter. If the converter sets a content type with a charset parameter, use that as the encoding for the Request.Body instance.
  2. Extend SpringEncoder such that a message converter may be configured with a charset override (i.e. "whenever calling this converter, assume a specific charset for the body encoding".
  3. Failing the above, at least limit the default behavior to cases where it would make sense. Treating a request with a content type of application/octet-stream as being in UTF8 is not a safe assumption. (Same goes for most if not all of the non-text types + subtypes.)

Please let me know if anything is unclear or if you need more information.

Comment From: ghost

Can this be marked as a defect instead of an enhancement? It is very much broken in the scenario in which we first discovered it, and our use case is not at all exotic. We're able to work around it with some local hacks, but it would be nice if this worked out of the box... Thanks!

Comment From: OlgaMaciaszek

Unfortunately, deriving the charset from the request Content Type charset is not going to be very useful, since, unless directly specified, the often the UTF-8 charset is added as default anyway. Will verify for non-text types based on types from org.springframework.http.MediaType to begin with. We might need to broaden the verification in future as different scenarios get brought to us.

Comment From: ghost

Unfortunately, deriving the charset from the request Content Type charset is not going to be very useful, since, unless directly specified, the often the UTF-8 charset is added as default anyway

True, it won't be there all the time. But I think it's a reasonable first condition to check, since if it is set then it could well be intentional (in our case it absolutely is.)

I'm also not sure why we assume that the default is UTF-8... If there's not a clear "right" answer, shouldn't the default be to not put a charset on there at all?

Comment From: OlgaMaciaszek

There's no way of telling if it's intentional or not - the problem is that some message converters will use it as a default, for example, org.springframework.http.converter.protobuf.ProtobufHttpMessageConverter and switching to getting the charset from content-type was causing the protobuf scenarios to fail, but I could add it as an opt-in resolution strategy (with a flag).

Comment From: ghost

I could add it as an opt-in resolution strategy (with a flag).

:+1: That would work perfectly for our use cases. Thank you!