• Adjust all affected classes including the Document.
  • Update docs.

Related to #405

Comment From: markpollack

@aseovic What do you think?

Comment From: aseovic

Well, it's (slightly) better than List<Double>... It's using 16 (instead of 24) bytes per vector dimension, which is still a far cry from 4 bytes per dimension a float[] would use. And it still requires boxing and unboxing to do anything useful with, which adds up.

I understand the reluctance to use float[] directly, as it ties you to it and makes it difficult to use any other vector type (and models such as Cohere are already capable of creating int8 and bit embeddings, and any float32 vector can be quantized into similar embeddings to save space, regardless of the model used), but what we've done to address that is introduce Vector<T> interface, where T is one of float[], byte[], or BitSet via corresponding Float32Vector, Int8Vector and BitVector implementations.

It's really not that difficult to do, and it allows you to store the embeddings using the optimal memory representation, and provides a place to add common vector properties and operations on, such as dimension, magnitude, dotProduct, etc.

Comment From: tzolov

@aseovic, I guess your assumptions are based on the presumption that all Vector Store APIs take float[]. My experience integrating with those shows that many use List and some even List to represent vectors. So for those the float[] won't make difference or likely introduce additional boxing/unboxing steps.

Having said this i've updated the PR to float[]. Lets see how this plays out.

Comment From: markpollack

@aseovic

but what we've done to address that is introduce Vector interface

Where?

Comment From: markpollack

Fixed a few tests and the typesense impl. Merged as d538e00643b3dcab0a3e0595aa406775733aaf3e

Comment From: aseovic

@aseovic, I guess your assumptions are based on the presumption that all Vector Store APIs take float[]. My experience integrating with those shows that many use List and some even List to represent vectors. So for those the float[] won't make difference or likely introduce additional boxing/unboxing steps.

@tzolov I guess my take on it as that you should optimize for the vector stores that use optimal data structure to represent vectors in Java, not for the ones that don't ;-)

In other words, make the inefficient stores pay the price of that conversion if they need to, instead of punishing everyone. So yes, float[] is a much better option. Thank you!

Comment From: aseovic

@aseovic

but what we've done to address that is introduce Vector interface

Where?

@markpollack In Coherence.