Our Spring Boot 3.2.0 app hangs for random builds during a very early start-up phase. We see this thread-dump:

"main" #1 prio=5 os_prio=31 cpu=17588,58ms elapsed=17,96s tid=0x0000000145009a00 nid=0x2203 runnable  [0x000000016d49a000]
   java.lang.Thread.State: RUNNABLE
    at org.springframework.boot.loader.zip.ZipString.readInBuffer(ZipString.java:281)
    at org.springframework.boot.loader.zip.ZipString.hash(ZipString.java:112)
    at org.springframework.boot.loader.zip.ZipContent$Loader.add(ZipContent.java:445)
    at org.springframework.boot.loader.zip.ZipContent$Loader.loadContent(ZipContent.java:581)
    at org.springframework.boot.loader.zip.ZipContent$Loader.openAndLoad(ZipContent.java:543)
    at org.springframework.boot.loader.zip.ZipContent$Loader.loadNestedZip(ZipContent.java:537)
    at org.springframework.boot.loader.zip.ZipContent$Loader.load(ZipContent.java:522)
    at org.springframework.boot.loader.zip.ZipContent.open(ZipContent.java:372)
    at org.springframework.boot.loader.zip.ZipContent.open(ZipContent.java:361)
    at org.springframework.boot.loader.jar.NestedJarFileResources.<init>(NestedJarFileResources.java:57)
    at org.springframework.boot.loader.jar.NestedJarFile.<init>(NestedJarFile.java:141)
    at org.springframework.boot.loader.jar.NestedJarFile.<init>(NestedJarFile.java:120)
    at org.springframework.boot.loader.net.protocol.jar.UrlNestedJarFile.<init>(UrlNestedJarFile.java:42)
    at org.springframework.boot.loader.net.protocol.jar.UrlJarFileFactory.createJarFileForNested(UrlJarFileFactory.java:86)
    at org.springframework.boot.loader.net.protocol.jar.UrlJarFileFactory.createJarFile(UrlJarFileFactory.java:55)
    at org.springframework.boot.loader.net.protocol.jar.UrlJarFiles.getOrCreate(UrlJarFiles.java:72)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlConnection.connect(JarUrlConnection.java:289)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlConnection.getJarFile(JarUrlConnection.java:99)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.getJarFile(JarUrlClassLoader.java:185)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.definePackage(JarUrlClassLoader.java:143)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.definePackageIfNecessary(JarUrlClassLoader.java:126)
    at org.springframework.boot.loader.net.protocol.jar.JarUrlClassLoader.loadClass(JarUrlClassLoader.java:99)
    at org.springframework.boot.loader.launch.LaunchedClassLoader.loadClass(LaunchedClassLoader.java:91)
    at java.lang.ClassLoader.loadClass(java.base@17.0.5/ClassLoader.java:520)
    at java.lang.Class.getDeclaredMethods0(java.base@17.0.5/Native Method)
    at java.lang.Class.privateGetDeclaredMethods(java.base@17.0.5/Class.java:3402)
    at java.lang.Class.getDeclaredMethod(java.base@17.0.5/Class.java:2673)
    at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:89)
    at org.springframework.boot.loader.launch.Launcher.launch(Launcher.java:53)
    at org.springframework.boot.loader.launch.JarLauncher.main(JarLauncher.java:58)

I am not sure what this code exactly does, but it seems to calculate a hash code of a a file name from a JAR file embedded in the super-jar of our app. The code seems to operate on DataBlock instances of the central ZIP file directory.

The error occurs because the block it is processing happens to only contain the first half of the file name. Unfortunately, the last byte in the data block is the first byte of multi-byte character. It seems, this code tries to read the whole multi-byte character:

https://github.com/spring-projects/spring-boot/blob/a242bd81e29e70c3c7bc27b849d83a4c70bf7b39/spring-boot-project/spring-boot-tools/spring-boot-loader/src/main/java/org/springframework/boot/loader/zip/ZipString.java#L117-L121

But the data block seems to be unable to provide the missing byte, because it will not read any further data from the underlying file channel. It returns only the first byte of the multi-byte character again and again. And the hash function hangs.

I am not deep enough into it, do dig what is exactly going on. Furthermore, I am unable to provide the broken JAR as it comes from a closed source project.

Comment From: wilkinsona

Thanks for the report.

I am unable to provide the broken JAR as it comes from a closed source project

Any jar that reproduces the problem would be useful so that we can reproduce the problem, or even just a concrete example of the file names that cause the problem.

Comment From: jhackel-hypo

The ZIP record for the file as produced by zipdump is this:

000c8687: PK.0304: 0014 0800 0008 5788834c 04267862 00000fad 00002e4f 0054 0000 | 000c86a5 000c86f9 000c86f9 000c96a6 - de/genopace/bsh/pe/darlehen/validierung/FehlendeOderUngültigeBaufiSmartKosten.class

The path has the length 0x54 == 84. The data block happens to contain only 56 bytes of that path, which is the first half of the character ü.

Comment From: philwebb

I wonder if we've already fixed this with #38572. @jhackel-hypo Any chance you could try 3.2.1-SNAPSHOT before we try anything else?

Comment From: jhackel-hypo

I have no way to deliberately build a broken JAR. It just happens from time to time.

On the main branch, this test will also hang ZipStringTest:

@Test
void hashDataBlockWithInCompleteMultiByteCharacterAtTheEnd() throws IOException {
    ByteArrayDataBlock dataBlock = new ByteArrayDataBlock(
            (byte) 100, (byte) 101, (byte) 47, (byte) 103, (byte) 101, (byte) 110,
            (byte) 111, (byte) 112, (byte) 97, (byte) 99, (byte) 101,  (byte) 47,
            (byte) 98, (byte) 115, (byte) 104, (byte) 47, (byte) 112, (byte) 101,
            (byte) 47, (byte) 100, (byte) 97, (byte) 114, (byte) 108, (byte) 101,
            (byte) 104, (byte) 101, (byte) 110, (byte) 47, (byte) 118, (byte) 97,
            (byte) 108, (byte) 105, (byte) 100, (byte) 105, (byte) 101, (byte) 114,
            (byte) 117, (byte) 110, (byte) 103, (byte) 47, (byte) 70, (byte) 101,
            (byte) 104, (byte) 108, (byte) 101, (byte) 110, (byte) 100, (byte) 101,
            (byte) 79, (byte) 100, (byte) 101, (byte) 114, (byte) 85, (byte) 110,
            (byte) 103, (byte) -61
    );

    ZipString.hash(null, dataBlock, 0, 87, true);
}

This pretty much the same which ZipContent.Loader.add feeds to the function in my error case, except that there is more data at the beginning of the data block.

I do not really grasp how ZipContent.Loader.add ever deals with file names reaching beyond the end of the data block.

Comment From: jhackel-hypo

It seems to me that ZipContent.Loader.add must never feed a split string to ZipString.hash. In fact, that should be a precondition when calling ZipString.hash with a data block.

Comment From: jhackel-hypo

I think, I do understand the problem now. I am also pretty sure that 8c7e8778a64d254f591430ad9c2ecc9f29c1f2ab is not fixing the problem.

The problem is, that ZipString#hash is assuming, it can read the whole multibyte character by stepping a byte backwards (line 118) and reading from the FileChannelDataBlock again (line 112):

https://github.com/spring-projects/spring-boot/blob/a242bd81e29e70c3c7bc27b849d83a4c70bf7b39/spring-boot-project/spring-boot-tools/spring-boot-loader/src/main/java/org/springframework/boot/loader/zip/ZipString.java#L111-L121

But this is not the case. The data block has half of the multibyte character at the last position of its FileChannelDataBlock.ManagedFileChannel#buffer. Stepping one position backward will return this first half of the multibyte character but not the second byte because it is not in the buffer. So, the FileChannelDataBlock#read method will return 1, because a single byte could be read. It will only read more bytes into its buffer, if the caller is asking for a position beyond what is already buffered (line 181):

https://github.com/spring-projects/spring-boot/blob/c50172d5c7ef282f375a53f8fe5c59de9de58311/spring-boot-project/spring-boot-tools/spring-boot-loader/src/main/java/org/springframework/boot/loader/zip/FileChannelDataBlock.java#L179-L183

But because ZipString.hash is always stepping back one position, it will never go beyond the limits of the buffer. It will receive the same byte again and again.

Comment From: jhackel-hypo

The implementation of ZipString#hash is making the wrong assumption that it can always read two (actually up to four) consecutive bytes from the data block. This is not the case. The clients of DataBlock#read must be able to process the data byte by byte.

Comment From: jhackel-hypo

I guess a possible solution could be to use DataBlock#readyFully here:

https://github.com/spring-projects/spring-boot/blob/a242bd81e29e70c3c7bc27b849d83a4c70bf7b39/spring-boot-project/spring-boot-tools/spring-boot-loader/src/main/java/org/springframework/boot/loader/zip/ZipString.java#L276-L286

The returned count would be buffer.position() then.

The method readInBuffer is also called by compare. But that method might be affected by the same problem.

Comment From: philwebb

Thanks very much for the detailed analysis @jhackel-hypo!

I think I've pushed something that will fix this. I'm not keen to move DataBlock#readyFully because those methods are called a lot and I want to keep things as fast and lean as possible.