Gorm Set postgres schema dynamically in a multitenant app

Your Question

I am building a multitenant app with postgres. Tenants are split by posgtres schema. On every http request, I determine the tenant's schema using domain or header information. I create a new *gorm.DB using Session() in the beginning of each request and pass it down for further querying. My question is how do i set schema on this object so that my further queries will execute on that schema during the lifecycle of a specific http request?

Few approaches I tried,

I tried set search_path to schema on each request and it did not work since Go takes any available connection from the pool for each query
I tried using NamingStrategy.TablePrefix on the *gorm.DB like below on each http request. But looks like this configuration is only processed only during initialization and not dynamically. Please correct me if I am wrong about this. db.Config.NamingStrategy = schema.NamingStrategy{ TablePrefix: pgschema + ".", }

Expected answer

What are the possible solutions to this situation?

Comment From: jinzhu

Use TableName or Scope with Table

Comment From: NeverBehave

Use TableName or Scope with Table

Really appreciate if there could be any sample code, etc.

If using table name, I am not sure how to decide the full table name when queries uses struct e.g. db.Where(&Instance{Id: 123}). Under such circumstances are we responsible for setting the correct name?

Comment From: ghost

Hi guys could you re-open this?

I have the same issue and this explanation:

Use TableName or Scope with Table

Doesn't helped me a lot...

So in my case I have the following situation

I have a struct called Category
Also many tenants

And my tenant is setted dynamically, as supposed I need to use Scopes to change the table name on-the-fly. Since the breaking change

TableName will not allow dynamic table name anymore, the result of TableName will be cached for future

So how can I achieve this? I tryed:

func Tenant(tenant string) func(*gorm.DB) *gorm.DB {
    return func(db *gorm.DB) *gorm.DB {
        // What I suppose to do here?
        return db.Table(tenant +  "_" + originalTableName???)
    }
}

But I wasn't able to get the original table name ...

Comment From: NeverBehave

@mirusky-enube That's basically what I have so far. Really confusing.

In my understanding, what author treating this so far is like table prefixing (and you can't do that either, for now I guess)... since schema is a unique feature comes from postgres, there isn't a specific optimization for that. Not even a proposal to make schema more useful as well from my research in issues here.

Comment From: mirusky

@jinzhu could you reopen pls?

@NeverBehave I found a way, but looks like a workaround, based on this Issue

We could use something like:

func Tenant(tenant string, model interface {}) func(*gorm.DB) *gorm.DB {
    return func(db *gorm.DB) *gorm.DB {
        stmt := &gorm.Statement{DB: db}
            stmt.Parse(model)
        return db.Table(tenant +  "." + stmt.Schema.Table)
    }
}

It will work but you need to pass the Struct twice (in the scope and in the gorm method ). I didn't tested with relationships like has one, has many, m2m and belongs to ...

But I believe it won't work with relationships.

Comment From: NeverBehave

@mirusky Oh thanks for this partial work around, but how would you use this Scope ?

From what I tested, you will have to specify the table first, but at that moment I don't think the table name is fixed. E.g.

db.Find().Table("xxx") // not working, table won't change
db.Table("xxx").Find()

I am not sure if I have done something wrong but I may come back and update this section later.

Comment From: mirusky

@NeverBehave I use in this way:

db.Scope(Tenant("tenant-1",User{})).Find(&User{ID:1})

As I said we need to pass the Struct twice, so I don't have sure if it will work with relationships.

Comment From: NeverBehave

@mirusky Thanks for the explanation! I miss the first parameter at first glance ouch... Yeah this is less flexible, and I don't think it works w/ Raw, etc. I can't tell how ORM here can seamlessly adopt the schema for various circustances as well.

What about writing a DBResolver? I think it would be the best option so far by controlling the connection schema state and the rest of the SQL won't need to be changed. Just using the same connection within same session/context.

Update: I found that it seems GORM could extract table name from Raw SQL. I haven't tested it yet but I think even though it works, same limitation applied. I am not sure for the DBResolver Policy, it is possible to create/execute SQL for specific schema.

Comment From: mirusky

I don't know, I was reading docs again and i found Clauses and StatementModifier. Maybe is possible to use it to change SQL stmt and adds a schema before table name...

But I need to learn how it works ...

Comment From: NeverBehave

The approach to change clauses/statement would be problematic as I think there would be lots of corner cases. I would prefer e.g. set search_path and keep connection state for the same context/session.

It would be transparent for the application: none of the existing SQL needs to be changed as long as we keep the same state

Comment From: NeverBehave

@mirusky I am actually working on an approach: a self managed pool of *gorm.DB

Basically I wrote a simple DBManager that handles different tenants, so this manager will be shared dynamically and create connections on demand. It avoids context switch (database will have to flush if switch happened within connection) and potential corner cases, if existing application has complex queries.

The demo below does not limit the number of tenant/connections, and I am not sure how to (correctly) manually control gorm's connection pool, related #3145

import (
    "fmt"
    "os"
    "sync"
    "time"

    "gorm.io/driver/postgres"
    "gorm.io/gorm"
    gocache "zgo.at/zcache"
)

var cacheHandler *gocache.Cache

type StoreDB struct {
    CreationMutex sync.Mutex
    cache         *gocache.Cache
}

type Context struct {
    Schema string
}

func (s *StoreDB) CreateDBHandler(ctx *Context) *gorm.DB {
    // Maybe we could use a better mechanism to do this.
    s.CreationMutex.Lock()
    defer s.CreationMutex.Unlock()

    // Double check before moving to Creation precedures
    if db, found := s.cache.Touch(ctx.Schema, gocache.DefaultExpiration); found {
        return db.(*gorm.DB)
    }

    dsn := fmt.Sprintf("host=%s user=%s password=%s dbname=%s sslmode=disable search_path=%s", getEnv("DB_HOST", "db"), getEnv("DB_USER", "postgres"), getEnv("DB_PASSWORD", "example"), getEnv("DB_NAME", "some-database"), ctx.Schema)
    db, err := gorm.Open(postgres.Open(dsn), &gorm.Config{})

    if err != nil {
        panic(err)
    }

    cacheHandler.Set(ctx.Schema, db, gocache.DefaultExpiration)

    return db
}

func (s *StoreDB) GetDBHandler(ctx *Context) *gorm.DB {
    db, found := s.cache.Touch(ctx.Schema, gocache.DefaultExpiration)
    if found {
        return db.(*gorm.DB)
    }

    return s.CreateDBHandler(ctx)
}

func initManager() *StoreDB {
    cacheHandler = gocache.New(10*time.Minute, 10*time.Minute)
    cacheHandler.OnEvicted(func(s string, i interface{}) {
        // https://github.com/go-gorm/gorm/issues/3145
        sql, err := i.(*gorm.DB).DB()
        if err != nil {
            panic(err)
        }
        sql.Close()
    })

    return &StoreDB{
        cache: cacheHandler,
    }
}

Example usage

dbMap := initManager()
gdb := dbMap.GetDBHandler(&Context{
    Schema: "some_schema",
})
// Now, gdb is the normal *gorm.DB with given schema

The performance is basically the same as directly Open and do queries, and for application, they only need to replace their db handler. I guess it may be an option.

@jinzhu Any suggestion? Thanks!

Comment From: cesc1802

I also need to handle this problem. now, I writing an interface to get db instance by schema name. if someone has better solutions. Please help me

Comment From: mirusky

A different approach is create a transaction:

db.Transaction(func(tx *gorm.DB) error {
  // do some database operations in the transaction (use 'tx' from this point, not 'db')
  tx.Exec("SET search_path TO tenant_a;")

  // At this point this whole transaction is inside the defined search_path aka tenant
  if err := tx.Create(&Animal{Name: "Lion"}).Error; err != nil {
    return err
  }

  // prevent this connection to be reutilized with wrong tenant
  tx.Exec("SET search_path TO DEFAULT;")

  // return nil will commit the whole transaction
  return nil
})

Is important to note that the search_path is changed twice, at the beginning and at the end. Because of PostgreSQL Documentation:

If SET (or equivalently SET SESSION) is issued within a transaction that is later aborted, the effects of the SET command disappear when the transaction is rolled back. Once the surrounding transaction is committed, the effects will persist until the end of the session, unless overridden by another SET.

In other words, we need to reverse the change of the first set or it will persist until this session is discarded.

Comment From: cesc1802

A different approach is create a transaction:

```go db.Transaction(func(tx *gorm.DB) error { // do some database operations in the transaction (use 'tx' from this point, not 'db') tx.Exec("SET search_path TO tenant_a;")

// At this point this whole transaction is inside the defined search_path aka tenant if err := tx.Create(&Animal{Name: "Lion"}).Error; err != nil { return err }

// prevent this connection to be reutilized with wrong tenant tx.Exec("SET search_path TO DEFAULT;")

// return nil will commit the whole transaction return nil }) ```

Is important to note that the search_path is changed twice, at the beginning and at the end. Because of PostgreSQL Documentation:

If SET (or equivalently SET SESSION) is issued within a transaction that is later aborted, the effects of the SET command disappear when the transaction is rolled back. Once the surrounding transaction is committed, the effects will persist until the end of the session, unless overridden by another SET.

In other words, we need to reverse the change of the first set or it will persist until this session is discarded.

so have work with relations?

Comment From: ghost

@cesc1802 if all the models is in the same schema, probably It will work with relationships.

Comment From: NeverBehave

@mirusky ‘s approach has heavy performance penalty. As mentioned in gorm, transaction will add at least 30% of the overhead. And it is not only write action. All read/query will also need to be wrapped and suffered the performance cost, or your data will be mixed.

And tested shown that ctx on the fly switch/flush will also cause extra performance issue. Basically operation time takes 2x compared to direct connection

@cesc1802 That’s basically the solution I have done so far. Create instances on demand and reuse it. The performance loses could be ignored under this mode. However, watch out for max connections.

Comment From: mirusky

@NeverBehave by default all the operations is inside a transaction, so if you don't disabled it explicitly there is no performance cost. As shown in the docs:

GORM perform write (create/update/delete) operations run inside a transaction

So since there is no default support to postgres schema there 2 workarounds:

Create a gorm instance per schema
Explicitly use transactions to change the schema on-the-fly

Comment From: NeverBehave

@mirusky I am saying the on-the-fly approach will make all Query (Read Action) have to be wrapped inside the transaction, which is not necessary in most of the use case. All write(create/delete/etc.) action by default could suffered this performance cost which is okay, but the on-the-fly approach will still add an extra overhead -- prepared stmt cache invalidated. See https://dba.stackexchange.com/a/166511

Changing the search_path causes Postgres to start from scratch with prepared statements. The manual on PREPARE:

Also, if the value of search_path changes from one use to the next, the statement will be re-parsed using the new search_path. (This latter behavior is new as of PostgreSQL 9.3.)

It is a serious downgrade(double downgrade for query!) in normal use cases, especially read/query are more then writes/update, and hard to get around if not treated carefully.

Comment From: bartventer

TL;DR solution: I published a package which provides drivers to support multi-tenancy and also provides some handy HTTP middleware: https://github.com/bartventer/gorm-multitenancy