Description
When using WebFlux to upload large files, the upload process sometimes hangs before reaching the route's business logic. Specifically, the issue occurs during the client’s file upload to the org.springframework.http.codec.multipart.MultipartParser
when writing to a temporary file, causing the upload to become stuck. This problem is intermittent but has a certain probability of occurring when uploading files of 200MB, between 2-3GB, and between 4-5GB.
Steps to Reproduce
- Set up a Spring Boot application using WebFlux to handle file upload requests.
- Upload a large file (e.g., 200MB, 2-3GB, or 4-5GB).
- Observe the upload process. Occasionally, the upload hangs before the file is fully transmitted to the server. At this point, the generated temporary file
xxx.multipart
is incomplete, and the request remains in a pending state without completing the upload.
Additional Resources
- A minimal reproduction project has been provided that can replicate the aforementioned issue, However, it’s not guaranteed to happen every time; I tried a few times: large-file-upload.zip
- Screen Recording Demonstration:
https://github.com/user-attachments/assets/4d92b3e3-df14-4217-9081-52a321516900
- Relevant logs: upload-file.log
In the screen recording, I used an ISO file that I downloaded from the Manjaro official website: https://download.manjaro.org/gnome/24.2.1/manjaro-gnome-24.2.1-241216-linux612.iso
The file upload progress in the video is stuck at 68% and does not complete, regardless of how long you wait, with no error messages displayed
Expected Behavior
Large files should upload and be processed smoothly without hanging or blocking at any stage.
Actual Behavior
When uploading large files, the upload process sometimes hangs during the MultipartParser
phase of writing to the temporary file, preventing the upload from completing.
Environment Information
- Spring Boot Version: 3.4.1
- Java Versions:
- OpenJDK 17
- Eclipse Temurin JDK 21.0.5
- Operating Systems:
- macOS
- Linux
- Other Relevant Dependencies:
spring-boot-starter-webflux
Note: The issue has been tested and reproduced across all the above-mentioned systems and JDK versions.
Additional Information
This issue is not caused by business logic but by the framework’s handling of large file uploads and writing to temporary files, which leads to blocking. We hope the development team can investigate and resolve this issue to enhance the stability and reliability of large file uploads.
Comment From: guqing
Perhaps it would be more appropriate to raise this issue in the spring-framework project. I apologize for overlooking this. Could the developers please transfer this issue to the spring-framework repository?🤪
Comment From: JohnNiang
Several uploading tries of the same file 2.7G Nov 26 15:38 Fedora-Server-dvd-x86_64-41-1.4.iso
.
1.7 GiB [####################] 6138071211851868018.multipart
1.5 GiB [################# ] 14413026015246421036.multipart
1.4 GiB [################ ] 15576621334079039706.multipart
1.1 GiB [############ ] 14246059143399172415.multipart
Comment From: sdeleuze
Thanks for the very high quality reproducer, that's much appreciated. So far I am unable to reproduce after 10 tries with manjaro-gnome-24.2.1-241216-linux612.iso
. Could you share some indications on how frequent the issue is on your side?
Comment From: chenggangpro
Through my debugging process. If upstream() != null && !this.sink.isCancelled() && this.sink.requestedFromDownstream() == 0 && !this.requestOutstanding.get() then it would hang up for 100% through my local tests about dozens of times. Whenever the upload is successful, the previous debug point will never be reached.
Comment From: bclozel
@chenggangpro it would be interesting to know which condition doesn't match. Is "this.sink.requestedFromDownstream() == 0" or "this.requestOutstanding == false"? This is probably a concurrency issue in the parser and we need to pinpoint exactly the issue to fix it.
I think the proposed fix in #34388 accidentally fixes things by over-requesting data but queuing "onNext" might cause other issues, too much memory consumption or even parsing errors?
Comment From: chenggangpro
@bclozel I think it's the condition this.sink.requestedFromDownstream() == 0
but I don't know the cause is.
My local debugging is as below:
I add some debug logging points to MultipartParser
:
MultipartParser#parse
method
public static Flux<Token> parse(Flux<DataBuffer> buffers, byte[] boundary, int maxHeadersSize, Charset headersCharset) {
return Flux.create(sink -> {
MultipartParser parser = new MultipartParser(sink, boundary, maxHeadersSize, headersCharset);
sink.onCancel(parser::onSinkCancel);
sink.onRequest(l -> logger.warn("===== Sink On request : " + l));// here is the debug logging point
buffers.subscribe(parser);
});
}
I didn't add
parser.requestBuffer()
into thesink.onRequest(...)
, just the logging point.
MultipartParser#requestBuffer
method
private void requestBuffer() {
if (upstream() != null &&
!this.sink.isCancelled() &&
this.sink.requestedFromDownstream() > 0 &&
this.requestOutstanding.compareAndSet(false, true)) {
request(1);
}else if(!this.requestOutstanding.get()){
// here is the debug logging point
logger.warn("===== Request buffer called =================");
logger.warn("===== Sink is cancelled :" + sink.isCancelled());
logger.warn("===== Sink requested from down stream :" + sink.requestedFromDownstream());
logger.warn("===== Request buffer called =================");
}
}
Then I uploaded manjaro-gnome-24.2.1-241216-linux612.iso
, and here is the log large-file-upload-original-debug-logging.zip.
You can see lines between L#483532
and L#483539
of the log file; sink.requestedFromDownstream()
is ZERO. Furthermore, therequestBuffer()
method is never called within the sink.onRequest(...)
. So in my opinion this should be the bug point, but I don't know the cause of sink.requestedFromDownstream() == 0
. When I first dove into the parser, I added a debug point at MultipartParser.java#L192 with a condition sink.requestedFromDownstream() == 0
. This is the reason I added the logging debug point as mentioned earlier.
I kept running PR #34388 for 6 hours yesterday, and there were no hanging issues or parsing errors. However, I am not sure if my fix is actually correct or if there are any potential errors that I haven't noticed. I hope my debugging process is useful for you all in solving this issue.
Comment From: guqing
Thanks for the very high quality reproducer, that's much appreciated. So far I am unable to reproduce after 10 tries with
manjaro-gnome-24.2.1-241216-linux612.iso
. Could you share some indications on how frequent the issue is on your side?
Thanks for looking into this! There isn’t a fixed frequency for reproduction—it might take just 3 attempts sometimes, while other times it doesn’t reproduce even after dozens of tries. It seems to require a bit of luck. However, in applications with a large user base, users do encounter this issue. Fortunately, they can usually resolve it by retrying, as seen in cases like https://github.com/halo-dev/halo/issues/7170
Comment From: sdeleuze
I just reproduced it.
Comment From: sdeleuze
@chemicL We suspect that here when this.sink.requestedFromDownstream() == 0
, request(1);
is skipped (which is intended), but then the related requestBuffer()
method is never invoked again, and we get the transfer of big files hanged forever.