Investigating once again slow Spring Boot start times, I played around with Java Mission Control profiler and found out that StringUtils#cleanPath alone is responsible for 7% of all CPU time spent during setup of the application context. Looking at the implementation I was able to identify some potential hotspots (e.g. LinkedList). The application is currently using Spring Framework 5.1.10 and I was happy to find that some of these issues have already been fixed in more recent versions of Spring.

However there are still a few more opportunities to avoid memory allocations and copying of data:

  • Set a sensible initial capacity for lists and string builders: optimize for the common cases. We will sometimes allocate a few bytes more than required, but at least we avoid re-allocating the backing array. For the StringBuilder we need to iterate the collection twice to calculate the capacity beforehand. But iterating is a lot cheaper compared to allocating memory two or more times.
  • Re-use strings instead of concatenating their parts again: an earlier optimization introduced short-circuiting for a common case. We have computed the return value already before splitting it, so return it directly.
  • Do not concatenate with the empty string: Future JDKs might alleviate this, but at least with Java 11, return "" + somestring; is slower than return somestring;. Obviously it is more expensive memory-wise due to the construction of the newly resulting string.

This PR relates to and builds on #24674, #25650, #25552, #25553, and others.

Comment From: knittl

Is there still anything required from my part? Should I provide benchmarks to prove the optimization?

Is something else blocking this change?

Comment From: bclozel

Closed with cc026fcb8ae5117

Comment From: bclozel

Thanks @knittl - this is now merged. I ran a few benchmarks and it seems that this optimization helps both StringUtils#delimitedListToStringArray (+40% throughput) and StringUtils#cleanPath (+10% for the worst case).

Comment From: knittl

@bclozel thanks for merging, polishing, and benchmarks! I wonder how big the overhead of converting each collection item to a string when pre-computing the string builder size for non-string collections (in cc026fcb8ae51172f3d063e7ed07a32927e23d8e)? For string collections, it is obviously just a no-op, but for generic collections of a different type stringifying each of its elements twice might incur quite some memory pressure and runtime overhead (depending on the toString complexity)?

One of the reasons the method was duplicated was to have a specialization for string collections. One idea I had was to pre-compute all string representations, copy them to a new collection, then use this collection to populate the string builder. With your JMH benchmarks (thanks!), copying the collection seems to incur quite some overhead itself.