This PR fixes an issue with parallel stream used in JsonReader
.
Problem
I wrote a simple program to test the JsonReader
. The JSON content is very simple.
[
{
"name": "Alex",
"email": "alex@example.com",
"jobTitle": "Software Engineer"
},
{
"name": "Bob",
"email": "bob@example.com",
"jobTitle": "System Admin"
}
]
The code shown below extracts keys name
and jobTitle
and puts them into the Document
.
public class JsonReaderSample {
void read() {
var metadataGenerator = new JsonMetadataGenerator() {
@Override
public Map<String, Object> generate(Map<String, Object> jsonMap) {
return Map.of("email", jsonMap.getOrDefault("email", ""));
}
};
var resource = new FileSystemResource(
Path.of(".", "data", "json-array.json"));
var reader = new JsonReader(resource, metadataGenerator, "name",
"jobTitle");
var docs = reader.read();
docs.forEach(System.out::println);
}
public static void main(String[] args) {
new JsonReaderSample().read();
}
}
When running this simple program, the output may look like below. The text of keys and values from different keys are mingled (jobTitle: name: System AdminBob
).
Document{id='a55153c1-09ab-4fc6-aa18-07a6f20e94d6', metadata={email=alex@example.com}, content='name: Alex
jobTitle: Software Engineer
', media=[]}
Document{id='46b27f32-1ea3-4ca9-a005-9c93195bb335', metadata={email=bob@example.com}, content='jobTitle: name: System AdminBob
', media=[]}
The usage of parallelStream
caused the invocations of StringBuffer.append
for different keys from different threads intertwined when operating on the shared StringBuffer
.
Fix
The parallel processing of multiple keys seems unnecessary, since parallel processing is already enabled for documents in the JSON array, and we are dealing with only in-memory data. So I changed to use normal stream and replaced StringBuffer
with StringBuilder
.
Comment From: markpollack
Yikes! thanks a ton.
merged in c205c7d5cad5d6278be9c56d3babe6ed9af87ae9