PGvector Store's restriction to a single hardcoded table name per database limits the development of multiple applications; this PR introduces configurable table names to improve database flexibility and management.
This change allows users to specify custom table name within a single database instance.
The update ensures backward compatibility and adheres to discussions in issue #747.
Changes:
- Added vectorTableName
and vectorIndexName
properties.
- Updated PgVectorStore
class to utilize these new properties.
Testing: - Integration tests to ensure custom table names are handled correctly.
Comment From: tzolov
@muthuishere thanks you for the contribution.
Reviewing you PR reminded me why we haven't implemented the parametrisation so far. Check this https://github.com/spring-projects/spring-ai/issues/747#issuecomment-2172456938
Parametrising the DDL with string concatenations (the approach you've taken) is prone for sql injections. So we need to find different approach. Let me know what do you think for my suggestions in the https://github.com/spring-projects/spring-ai/issues/747#issuecomment-2172456938
Comment From: muthuishere
@muthuishere thanks you for the contribution.
Reviewing you PR reminded me why we haven't implemented the parametrisation so far. Check this #747 (comment)
Parametrising the DDL with string concatenations (the approach you've taken) is prone for sql injections. So we need to find different approach. Let me know what do you think for my suggestions in the #747 (comment)
PR updated with commits to introduce the ability to specify custom table name for PgVectorStore, and stop pre create tables when custom table names are provided. also added tablename validations, field validations to ensure things are okay
Key changes include: - Adding properties to configure custom table names. - Modifying PgVectorStore to conditionally pre-create tables only when custom table names are not provided. - Implementing table & field validation for custom configurations. - Extensive tests to ensure compatibility with both new configurations and existing deployments.
With these changes, automatic table pre-creation is removed, and users are expected to manage table setup in custom scenarios, consistent with production environment practices.
Comment From: tzolov
Hi @muthuishere , I can see that you went above and beyond to try to validate the DDL parameters !
I had internal discussion with our spring data team and the conclusion is that there is not reliable solution. So we should assume that this is user's configuration responsibility. E.g. no need to try to do the validations ourself.
Perhaps we can keep the validation you've added but make it disabled by default. Also can enable the auto-generation for custom table names as well.
Additionally in https://github.com/spring-projects/spring-ai/issues/747 Rod mentions custom schema definition as well. I guess we need to add yet another parameter (e.g. schema name)?
What do you think?
Comment From: muthuishere
I am okay with your suggestions. I will be working on the following tasks:
- A property to specify the database schema (schemaName).
- A property to enable database validations; the default setting will be false (vectorTableValidationsEnabled).
- The custom table name can support auto-generation. If the table does not exist, we will automatically generate it as we did previously. We will create the index ourselves by appending _index to it, instead of asking users to specify this, as it is no longer relevant. (I would appreciate your thoughts on this.)
Let me know if you need any more.
Comment From: tzolov
Sounds like a compete plan. Thanks @muthuishere
Comment From: muthuishere
Sounds like a compete plan. Thanks @muthuishere
@tzolov
- A property to specify the table name (spring.ai.vectorstore.pgvector.vectorTableName) , defaults to existing table name(vector_store)
- A property to specify the database schema (spring.ai.vectorstore.pgvector.vectorTableName) defaults to public
- A property to enable database validations (spring.ai.vectorstore.pgvector.vectorTableValidationsEnabled) defaults to false
Added tests to ensure things are okay.
Comment From: tzolov
Thanks you @muthuishere . Great contribution!
rebased, squashed and merged at f0ca61252b0de2a3e3ab4a7cb493844eafd82139
Additional changes and fixes applied: - Rename properties to schemaName, tableName, and schemaValidation - Update pgvector documentation with new properties - Add schema/table name tests to PgVectorStoreAutoConfigurationIT and PgVectorStorePropertiesTests - Create standalone PgVectorSchemaValidator class for schema/table validation - Add missing 'CREATE SCHEMA IF NOT EXISTS' when initializeSchema=true - Remove redundant code and classes