Since JDK 1.8, using split() has shown to be more performance-efficient than StringTokenizer, with approximately a 30.5% difference in test results. - Test JDK versions: 1.8, 11, 17
While StringTokenizer is still being used in other parts, I've recently made a modification in one of the AOP sections under review in the spring-framework project. Although uncertain of the extent of its contribution to the project, the change not only improves performance but also makes the code more readable compared to StringTokenizer.
If this modification proves to be beneficial to the project, I'll continue to search for and update similar instances throughout the project in the future.
Comment From: snicoll
Thanks for the PR.
the change not only improves performance but also makes the code more readable compared to StringTokenizer.
Can you share the benchmark you've used to asses the performance improvement?
Comment From: dukbong
Thanks for the PR.
the change not only improves performance but also makes the code more readable compared to StringTokenizer.
Can you share the benchmark you've used to asses the performance improvement?
@snicoll I've created a benchmark report.
Performance Benchmark: StringTokenizer vs. split()
Overview
This report compares the performance of StringTokenizer and the split()
method in Java for splitting strings using simple delimiters.
Experimental Environment
- Language: Java
- Development Environment: Eclipse
- Number of Trials: 1,000,000 iterations x 10 trials
- JDK Versions: 1.8, 11, 17
Methods Utilized in Performance Testing
public static String[] TestPerformanceOfExistingCode(Method method) {
StringTokenizer nameTokens = new StringTokenizer(ARGUMENT_NAMES, ",");
int numTokens = nameTokens.countTokens();
if(numTokens > 0) {
String[] names = new String[numTokens];
for (int i = 0; i < names.length; i++) {
names[i] = nameTokens.nextToken();
}
return names;
}else {
return null;
}
}
public static String[] TestPerformanceOfImprovedCode(Method method) {
String[] names = ARGUMENT_NAMES.split(",");
if(names.length > 0) {
return names;
}
return null;
}
Results
JDK 1.8
- Average Time for StringTokenizer: 127.8ms
- Average Time for
split()
: 161.3ms - Test Result:
split()
method is approximately 26.2% slower.
JDK 11
- Average Time for StringTokenizer: 167ms
- Average Time for
split()
: 149.8ms - Test Result:
split()
method is approximately 26.1% faster.
JDK 17
- Average Time for StringTokenizer: 149.8ms
- Average Time for
split()
: 125.4ms - Test Result:
split()
method is approximately 21.5% faster.
Analysis
Since JDK 8, the split()
method has demonstrated better performance than StringTokenizer for simple string splitting using basic delimiters.
Note that complex delimiters and special cases were not tested; only modifications pertaining to the current experiment were conducted.
While the performance difference may not be significant, using the split()
method can lead to increased efficiency, code readability, and maintainability.
Conclusion
In JDK versions 1.8 and later, utilizing the split()
method for string splitting operations is more efficient.
However, depending on specific requirements and scenarios, the use of StringTokenizer may still be appropriate.
Considering that using the split()
method can enhance performance, code readability, and maintainability, it should be considered as a preferred option.
Future Considerations
It is necessary to validate the transformation of all StringTokenizers to split()
through various comparative experiments.
Observed Performance Improvement of split() Method
Explicit mention of performance improvement of the split() method since JDK 1.8 is hard to find, but the test results confirm that the performance has indeed improved.
Comment From: bclozel
Thanks for the proposal, but we're going to decline it.
I had a go with our StringUtils.tokenizeToStringArray
method that uses a StringTokenizer
and applied your recommendation.
public static String[] tokenizeToStringArray(
@Nullable String str, String delimiters, boolean trimTokens, boolean ignoreEmptyTokens) {
if (str == null) {
return EMPTY_STRING_ARRAY;
}
List<String> tokens = new ArrayList<>();
String[] split = str.split(delimiters);
for (String token : split) {
if (trimTokens) {
token = token.trim();
}
if (!ignoreEmptyTokens || token.length() > 0) {
tokens.add(token);
}
}
return toStringArray(tokens);
}
Witht he help of a quick JMH benchmark:
@Benchmark
public void tokenizeToStringArray(TokenizerState state, Blackhole blackhole) {
for (String source : state.source) {
blackhole.consume(StringUtils.tokenizeToStringArray(source, ","));
}
}
@State(Scope.Benchmark)
public static class TokenizerState {
@Param("10")
int elementCount;
@Param("20")
int inputCount;
Collection<String> source;
@Setup(Level.Iteration)
public void setup() {
Random random = new Random();
this.source = new ArrayList<>(this.inputCount);
for (int i = 0; i < this.inputCount; i++) {
ArrayList<String> tokens = new ArrayList<>(this.elementCount);
for (int j = 0; j < this.elementCount; j++) {
tokens.add(String.format("%0" + (random.nextInt(9) + 1) + "d", 1));
this.source.add(String.join(",", tokens));
}
}
}
}
I'm seeing the following results:
With StringTokenizer:
Benchmark (elementCount) (inputCount) Mode Cnt Score Error Units
StringUtilsBenchmark.tokenizeToStringArray 10 20 thrpt 10 41373.314 ± 4674.514 ops/s
StringUtilsBenchmark.tokenizeToStringArray:gc.alloc.rate 10 20 thrpt 10 2840.427 ± 323.345 MB/sec
StringUtilsBenchmark.tokenizeToStringArray:gc.alloc.rate.norm 10 20 thrpt 10 71993.649 ± 294.345 B/op
StringUtilsBenchmark.tokenizeToStringArray:gc.count 10 20 thrpt 10 1282.000 counts
StringUtilsBenchmark.tokenizeToStringArray:gc.time 10 20 thrpt 10 732.000 ms
With String#split
Benchmark (elementCount) (inputCount) Mode Cnt Score Error Units
StringUtilsBenchmark.tokenizeToStringArray 10 20 thrpt 10 33193.063 ± 2277.633 ops/s
StringUtilsBenchmark.tokenizeToStringArray:gc.alloc.rate 10 20 thrpt 10 3002.365 ± 212.610 MB/sec
StringUtilsBenchmark.tokenizeToStringArray:gc.alloc.rate.norm 10 20 thrpt 10 94849.055 ± 352.119 B/op
StringUtilsBenchmark.tokenizeToStringArray:gc.count 10 20 thrpt 10 1355.000 counts
StringUtilsBenchmark.tokenizeToStringArray:gc.time 10 20 thrpt 10 741.000 ms
We're seeing a throughput decrease (20%) and a higher allocation rate. So I don't think that StringTokenizer
vs String#split
is that an easy call to make in all situations. Maybe the situation is a bit different in the case of AbstractAspectJAdvisorFactory
, but I think the JMH benchmark I used is quite close. Micro Benchmarking is a complex subject and I think the code you've shared has probably a lot of biases that make the result irrelevant.
Comment From: dukbong
Thank you. It's an honor to receive feedback. I'll continue to study so that I can provide insights that could be helpful to open source projects in the future.