There are web servers encode Content-Disposition filename part using BASE64 like - filename="=?UTF-8?B?5pel5pys6KqeLmNzdg==?=" which translates to 日本語.csv.
But ContentDisposition.parse() is not aware of that format.
Comment From: rstoyanchev
RFC 6266, section 4.3 says "filename"
or "filename*"
where the latter is used for character sets other than ISO-8859-1. Is this base64 filename format based on any spec? If you are able to share it would help to know what server(s) emit this format.
Comment From: yusuke
The web server I'm dealing is https://bizstn.bk.mufg.jp/ It's an online banking system and I get that header when I download the transaction details as in CSV format. Below is the Content-Disposition header value of the actual response.
attachment; filename="=?UTF-8?B?TUVJU0FJMjAyMTAxMjkyMzAwMjYuY3N2?=";
Chrome browser decodes the response using BASE64 and recognizes it as MEISAI20210129230026.csv
The web server seems to be Apache, not sure what is running behind.
Comment From: yusuke
The ?[charset]?B format appears to be defined by RFC2047 - 4.1. The "B" encoding https://tools.ietf.org/html/rfc2047
Comment From: rstoyanchev
Thanks for the extra details. I'll have a look at this for 5.3.4. In the very least we can make sure that we can parse such headers correctly.