Spring Add support for batch operations (R2DBC)

Add support for batch operations to the DefaultDatabaseClient:

databaseClient.sql("INSERT INTO legoset (id, name, manual) VALUES(:id, :name, :manual)")
                .bind("id", 42055)
                .bind("name", "SCHAUFELRADBAGGER")
                .bindNull("manual", Integer.class)
                .add()
                .bind("id", 2021)
                .bind("name", "TOM")
                .bindNull("manual", Integer.class)
                .fetch().rowsUpdated()

Closes #259

Demo app: springR2dbcBatchExample.tar.gz

Comment From: patrickhuy

@sbrannen @mp911de this looks very intresting for me, could you share some insights what the next steps for this PR should be?

Comment From: mp911de

Batching through .add()would only properly work if the SQL doesn't change. DatabaseClient uses named parameters and expands named parameter markers using collection arguments into native bind markers. The number of bind markers depends on the actual collection size. This is useful when using IN clauses. I'm not quite sure about the direction.

We could follow the idea of NamedParameter of expanding the SQL using the first bindings and raising an exception later on if the expanded parameter count doesn't match the number of parameters found in a collection argument.

Other than that, please check out the contribution guide to learn how we accept contributions.

Comment From: MarkMarkyMarkus

Thanks for the response.

We could follow the idea of NamedParameter of expanding the SQL using the first bindings and raising an exception later on if the expanded parameter count doesn't match the number of parameters found in a collection argument.

Do you mean that we should check the expanded parameters count in NamedParameterUtils.substituteNamedParameters() or NamedParameter.addPlaceholder()?

Comment From: Andro999b

Is there any plans to add batching?

Comment From: gabfssilva

Batching through .add()would only properly work if the SQL doesn't change.

@mp911de, correct me if I'm wrong, but the SPI has two ways of batching, right? One that allows you to send a single command with multiple parameters (via parameter binding) & another that allows you to pass multiple commands and, although both are sent in a single roundtrip, the latter one is slower since the database will have to parse every single command, so if your command won't change, you should stick with the first approach.

I'm asking because the implementation of this PR is probably the most optimized (since it uses the first approach), although less flexible. Isn't there a way to add both ways to the DatabaseClient, so the developer would choose which way fits better for them?

Comment From: mp911de

@gabfssilva your perception is correct. Parametrized batching parses the command once and sends a execute command to the database for each binding set. I think it would be possible to bridge the .add(…)-based mechanism into DatabaseClient. The other type of batching, where commands are concatenated with ; can be used already because you're in control of the SQL statement.

Comment From: bwhyman

It works for me now. spring-data-r2dbc: 6.0.1; jasync-r2dbc-mysql: 2.1.12.

@Autowired
private io.r2dbc.spi.ConnectionFactory connectionFactory;
@Transactional
    public Mono<Void> update(List<User> users) {
        String sql = "update user u set u.address=? where s.id=?";
        return DatabaseClient.create(connectionFactory).sql(sql).filter(statement -> {
            for (User u : users) {
                statement.bind(0, u.getAddress()).bind(1, s.getId()).add();
            }
            return statement;
        }).then();
    }

Comment From: snicoll

@mp911de I am happy helping the merge process but I could use a review on your side first.

Comment From: mp911de

From the spec, we've learned that statement.….add().….add().….execute() (no trailing .add() before .execute()) is a pretty sharp edge leading easily to quite some confusion when using that API.

Considering batching with DatabaseClient would allow us to come up with a smoother API and an approach, that doesn't create DefaultGenericExecuteSpec instances for each bound parameter.

As a starting point, how about a bind (or params as in JdbcClient) method lambda, something along the lines of:

databaseClient.sql("INSERT INTO legoset (id, name, manual) VALUES(:id, :name, :manual)")
                .bind(params -> params.bind("name", name1).bind("manual", manual1))
                .bind(params -> params.bind("name", name2).bind("manual", manual2))
                .fetch();

The lambda parameter would be a simple interface carrying bind methods only to avoid confusion with Statement:

interface Params {
  Params bind(String name, Object value);
  Params bind(int index, Object value);
  Params bindNull(String name, Class<?> type);
  Params bindNull(int index, Class<?> type);
}

Comment From: snicoll

@MarkMarkyMarkus would you have some cycle to amend your PR with the review of Mark? If not, that's cool we can take over when time permits.

Comment From: MarkMarkyMarkus

Yes, I will take a look.

Comment From: jhoeller

Note that we intentionally designed our new JdbcClient without batch operations, so it is not a priority for R2DBC either. That said, if we can agree on an implementation style for multiple parameter bindings that is not at odds with the existing API design and that is reasonable to use with named parameters as well as indexed parameters, we may revisit this for both JDBC and R2DBC.

Comment From: hantsy

Hope there is a bind method to accept a list/array as params directly to avoid bind object one by one.

.bind(
  List.of(
    Tuple.of(param1, param2, param3...), 
    Tuple.of(param1, param2, param3...),
    ... 
  )
)

Comment From: sdeleuze

I did some test with https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/spring-webflux in order to use batch update instead of individual requests. Under high latency, batch update from this PR allows to increase the throughput by 50%, so I would be in favor of trying to move forward on this PR if we can find the right API.

Speaking about the API and related to @hantsy comment, I think we should evolve the API to make it easier to use a collection of data to bind, because if I am not mistaken, the current one is not practical to use for this very popular (likely the most popular) use case. I had to do something like that:

// Wrapper class is needed because variable used in lambda must be final or effectively final
var wrapper = new Object(){ GenericExecuteSpec spec =
        databaseClient.sql("UPDATE world SET randomnumber=:randomnumber WHERE id = :id"); };
worlds.forEach(world -> wrapper.spec = wrapper.spec.bind(params ->
        params.bind("id", world.id).bind("randomnumber", world.randomnumber)));
return wrapper.spec.fetch().rowsUpdated();

Maybe we should just have a GenericExecuteSpec method that takes a list/collection for the values to bind and a lambda to do binding logic, as a modernized version of JdbcTemplate#batchUpdate. I am not sure we should keep the variant proposed in this PR (I am not fully convinced we need batch updates for specifying the binding via the fluent API), but happy to get feedback on this.

I was not able to make it work with UPDATE world SET randomnumber=$2 WHERE id = $1 (which works for individual update) and had to use UPDATE world SET randomnumber=:randomnumber WHERE id = :id for the batch update variant. I am not sure why.

@bwhyman I was not able to make your filter() based proposal working.

@jhoeller @mp911de Any thoughts?

Comment From: bwhyman

databaseClient.inConnection(conn -> {
            Statement statement = conn.createStatement(sql);
            for (int i = 0; i < timetables.size(); i++) {
                var tb = timetables.get(i);
                statement.bind(0, snowflake.nextId())
                        .bind(1, collid)
                        .bind(2, tb.getStartweek())
                        .bind(3, tb.getEndweek());
                // The add() method cannot be called for the last time
                if (i < timetables.size() - 1) {
                    statement.add();
                }
            }
            return Flux.from(statement.execute()).collectList();
        });

R2DBC:3.3.1
Statement add() method cannot be called for the last time, it is very wired.

Comment From: sdeleuze

@bwhyman I was able to use batch updates with what you suggested but using inConnectionMany instead of inConnection. Except the if (i < timetables.size() - 1) part, it is pretty straightforward.

@mp911de @jhoeller Do you think we could relax the requirement of skipping the last add() to make it easier to use? If we do, I am not sure we need a dedicated API, we can maybe just create a related issue, close this PR, and document something like:

databaseClient.inConnectionMany(conn -> {
            Statement statement = conn.createStatement(sql);
            for (World world : worlds) {
                statement.bind(0, world.randomnumber).bind(1, world.id).add();
            }
            return Flux.from(statement.execute()).collectList();
        });

I am also wondering if dedicated batch update support would provide even more performance benefits (the improvement I get in Techempower are visible here).

Comment From: mp911de

Implicit batching with the current API opens pathways for a new class of bugs.

DatabaseClient.sql(…).bind(0, foo).bind(0, bar) uses a single statement while DatabaseClient.sql(…).bind(params -> …).bind(params -> …) runs the same statement with two parameter binding sets. Ideally, after the new method's first invocation, the calling code should never get a chance to call bind(position|name, …). Also, we should have a method that indicates that we're entering batch mode to make it obvious (andBind(params -> …)?). Probably someone can come up with a better name, though.

I like bind(params -> …) as a general addition.

Comment From: sdeleuze

I like bind(params -> …) as a general addition.

Ok so if nobody objects, I will close this PR unmerged, create a related issue for bind(params -> …) and work on that tentatively targeting Spring Framework 6.2.0.

Comment From: sdeleuze

Closing this (old) PR unmerged in favor of #33812.