Using jdbcTemplate batchUpdate improving the performance of multiple upsert

Thank you for taking time to contribute this pull request! You might have already read the [contributor guide][1], but as a reminder, please make sure to:

  • Sign the contributor license agreement
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

Comment From: markpollack

Thanks! I fixed one unrelated issue in reviewing this. I do wonder how good using batch as a default strategy is, for example if the number of documents gets to be very large. Perhaps a case for having something like Spring Batch manage this with 'chunks' would be useful to have in the future.

Comment From: markpollack

merged in 2a604262d824ebbcad5a3700d0df80d1da9e80e6

Comment From: gianielsevier

merged in 2a60426

Thanks for accepting my PR @markpollack. I've tested the original version using a dataset of 126k records from the medium csv file here. and it took more or less 17 minutes to ingest the whole file. Using the batch update approach it reduced to 11 minutes.

I also compared ingesting the same dataset using the neo4j vector store which took around 12 minutes to process the whole file and Redis that had a similar time to process (12 minutes)

I can give more implementation details about the benchmark if you like.

Cheers.