Story
A user writes a MessageConverter implementation that they wish to use via SpringEncoder. This message converter produces binary payloads.
Attempts to do so, however, have the request treated as though it were in the UTF8 charset, regardless of the actual content. This causes the request to be encoded and interpreted incorrectly. This holds true even if the custom converter specifies a charset parameter in the request content type(s) that it supports!
Steps to Reproduce
- Create a new
MessageConverterthat is not derived fromByteArrayMessageConverterorProtobufHttpMessageConverter. The converter should produce a binary request (non-text contents) with a suitable request content type, such asapplication/octet-stream. - Configure a
SpringEncoderwith this message converter. - Attempt to send a request with a body consisting of content converted by this message converter.
- Observe that the request is treated as being in UTF8 encoding.
Cause
The logic in question begins on line 128 of SpringEncoder.
Proposed Solution
Two potential solutions (not mutually exclusive) would be:
- Honor any request content types set by the
MessageConverter. If the converter sets a content type with acharsetparameter, use that as the encoding for theRequest.Bodyinstance. - Extend
SpringEncodersuch that a message converter may be configured with a charset override (i.e. "whenever calling this converter, assume a specific charset for the body encoding". - Failing the above, at least limit the default behavior to cases where it would make sense. Treating a request with a content type of
application/octet-streamas being in UTF8 is not a safe assumption. (Same goes for most if not all of the non-texttypes + subtypes.)
Please let me know if anything is unclear or if you need more information.
Comment From: ghost
Can this be marked as a defect instead of an enhancement? It is very much broken in the scenario in which we first discovered it, and our use case is not at all exotic. We're able to work around it with some local hacks, but it would be nice if this worked out of the box... Thanks!
Comment From: OlgaMaciaszek
Unfortunately, deriving the charset from the request Content Type charset is not going to be very useful, since, unless directly specified, the often the UTF-8 charset is added as default anyway. Will verify for non-text types based on types from org.springframework.http.MediaType to begin with. We might need to broaden the verification in future as different scenarios get brought to us.
Comment From: ghost
Unfortunately, deriving the charset from the request Content Type charset is not going to be very useful, since, unless directly specified, the often the UTF-8 charset is added as default anyway
True, it won't be there all the time. But I think it's a reasonable first condition to check, since if it is set then it could well be intentional (in our case it absolutely is.)
I'm also not sure why we assume that the default is UTF-8... If there's not a clear "right" answer, shouldn't the default be to not put a charset on there at all?
Comment From: OlgaMaciaszek
There's no way of telling if it's intentional or not - the problem is that some message converters will use it as a default, for example, org.springframework.http.converter.protobuf.ProtobufHttpMessageConverter and switching to getting the charset from content-type was causing the protobuf scenarios to fail, but I could add it as an opt-in resolution strategy (with a flag).
Comment From: ghost
I could add it as an opt-in resolution strategy (with a flag).
:+1: That would work perfectly for our use cases. Thank you!