• EmbeddingClient implementation that computes, locally, sentence embeddings with SBERT transformers.
  • Uses pre-trained transformer models, serialized into Open Neural Network Exchange (ONNX) format.
  • Deep Java Library and the Microsoft ONNX Java Runtime are used to run the ONNX models and compute the embeddings efficiently.
  • Add default tokenizer.json and model.onnx for sentence-transformers/all-MiniLM-L6-v2.
  • Add, configurable resource caching service to allow caching remote (http/https) resources to the local FS.
  • README.md provides information on how to serialize ONNX models.