MyBatis version
3.x.x
Database vendor and version
Oracle 19c
Test case or example project
https://github.com/tbcrawford/spring-mybatis-oracle-performance-issue
Steps to reproduce
Please see the README provided in the repo above. I have outlined the issue there. It includes a project with runnable test cases. An Oracle connection just needs specified.
Expected result
I expected that when specifying jdbcTypeForNull to JdbcType.NULL in the SqlSessionFactory configuration that I would no longer have to specify the jdbcType in the SQL XML code and have near matching performance with or without specifying jdbcType in the SQL XML.
Actual result
The performance of inserting 10,000 records into an Oracle table with pseudo random NULL field values for a few fields, without specifying the jdbcType in the SQL XML code, is nearly 200x slower than when the jdbcType is specified. Again, this happens despite the SqlSessionFactory configuration of jdbcTypeForNull being set to JdbcType.NULL.
Comment From: harawata
Hello @tbcrawford ,
Thanks for the demo project (it really saves my time!).
You seem to think that MyBatis does not respect jdbcTypeForNull
setting, but I don't think that's true.
From what I see, the facts are :
- Performance is good if accurate JDBC type is passed as the second argument of
java.sql.PreparedStatement#setNull()
. By 'accurate', I mean it matches the actual column data type. - Performance is bad if
java.sql.Types.NULL
is passed as the second argument ofjava.sql.PreparedStatement#setNull()
.
And this seems reasonable because the driver has to figure out the actual JDBC type when a generic type NULL
is specified (it matches your analogy about OraclePreparedStatement#executeLargeBatch
as well).
Please let me know if I am missing your point.
Comment From: tbcrawford-old
Hi @harawata,
You're welcome! That was my potential assumption, so you are correct. I believe that is a good summary. So overall, do you think it seems this could be a limitation of Oracle? If I were to setup a Postgres database and exclude the jdbcType in the SQL XML and provide random NULL values for integer columns, I don't notice any performance degradation. The performance appears just as fast as if the jdbcType was provided. My assumption is that this is just a difference between how Oracle and Postgres handle a missing jdbcType in the jdbc library they provide? I'm not well versed in the different jdbc libraries.
I also want to note, that I have specifically only noticed significant performance issues when trying to insert NULL for an Integer java field into a NUMBER column in Oracle. If I were to switch the column from NUMBER to VARCHAR2, and insert the same number of null values for 10,000 rows, the performance appears to nearly match that of when I provide the jdbcType values in the SQL XML. I have updated my demo project that I provided above in case you would like to review what I have done.
Comment From: harawata
Hi again @tbcrawford ,
Yes, that is how Oracle driver works, it seems. Although it's quite interesting that only NUMBER type is affected, it is completely a driver matter.
FYI, most drivers accepts generic type NULL
or OTHER
as the second argument of setNull()
, but it is not part of JDBC spec and mssql-jdbc, for example, requires 'accurate' type for some columns [1][2].
On a separate note, specifying jdbcType
or typeHandler
has positive impact on MyBatis performance and is recommended when dealing with many rows.
[1] https://groups.google.com/g/mybatis-user/c/NowScQE4_zc/m/zV7xkyOlBAAJ [2] https://github.com/microsoft/mssql-jdbc/issues/1269
Comment From: tbcrawford-old
@harawata thank you for the information and the sources! While it seems that NULL is accepted as a second argument to setNull()
here, it looks like it can incur a heavy performance penalty for Oracle at least.
Knowing this, would it be possible to have a greater call-out in the documentation to the potential performance impact when leaving out jdbcType in the sqlmap-xml.html#Parameters section to warn users of the potential pitfalls that some jdbc drivers may have? A potential 200x (maybe exponential?) slow down for inserting null without providing the jdbc type for Oracle databases seems like something that should be called out as a big warning. Some developers may have already run into this and wonder why their performance may tank when inserting many rows via batch execution. To me, this can definitely change architectural design decisions when building an application.
Also knowing this, do you personally think this is worthy of bringing to the attention of the ojdbc developers? Something tells me they may be aware of this, but given Postgres is able to handle NULL insertions without specifying a jdbcType, it makes me question if they do. I'm not sure if there are public ways of submitting an issue to them, but I might be able to get in contact with them through the company I work for and discuss this.
Comment From: harawata
@tbcrawford ,
I understand the severity of the issue, but if we start adding driver specific issues to the doc, it gets too verbose quickly. As you can imagine, there are so many 'tips' which are relevant only to certain driver, version, usage, etc.. And I think this ticket is quite informative for those who experience the same issue and look for a solution, thanks to you. :)
do you personally think this is worthy of bringing to the attention of the ojdbc developers?
I do. It would be nicer if @oracle open-sourced the driver, but if you have a way to report it, you should give them a chance to address the issue.
Comment From: tbcrawford-old
@harawata,
Thanks for the feedback. I definitely agree with your thought about the docs. They would probably become overly verbose very fast if we did that. However, what I originally had in mind was just a general statement statement like "By not specifying the jdbcType as part of the parameter, some drivers may increase processing time when executing batched statements where a column may be provided a null value". Though, now that I re-read that, it sounds more like the reader may begin asking questions like "which driver?" and "why won't you specify which driver?". So, I will leave that up to you guys, as the contributors to MyBatis. I also agree with you that this ticket has proven to be informative for anyone else who may experience the same issue looking for a solution to their performance problems with Oracle specifically.
I will explore my options and see if I can get in contact with their devleopers. I agree, I would love to see the ojdbc driver open-sourced by @oracle. I think this is an area where many experienced developers would be able to provide their input for performance improvements. It would also allow a lot of developers, like myself, to more easily debug any potential issues we see and provide a better platform for conversation. It's a bit hard to understand intent when variables don't have meaningful names in the decompiled jar.
For now, I am going to close this issue. I don't have anything more to add. Thanks for the feedback!
Comment From: harawata
I had a chance to do some follow-up on this matter and came up with a minimum repro code (plain JDBC/no MyBatis). This comment is a note to myself / someone who can contact Oracle Tech Support.
@CsvSource({"4", "0"})
@ParameterizedTest
void testGh2198(int sqlType) throws Exception {
Random rand = new Random(System.nanoTime());
// It uses https://github.com/harawata/jdbc-connection , but it's just getting a connection
try (Connection cnxn = new JdbcConnection("oracle19").getConnection()) {
try (Statement st = cnxn.createStatement()) {
try {
st.execute("drop table test");
} catch (SQLException e) {
//
}
st.execute("create table test (col1 number, col2 number, col3 number)");
}
try (PreparedStatement ps = cnxn.prepareStatement("insert into test values (?,?,?)")) {
for (int i = 0; i < 10000; i++) {
ps.setInt(1, rand.nextInt());
int num2 = rand.nextInt();
if (num2 % 3 == 0) {
ps.setNull(2, sqlType);
} else {
ps.setInt(2, num2);
}
int num3 = rand.nextInt();
if (num3 % 3 == 0) {
ps.setNull(3, sqlType);
} else {
ps.setInt(3, num3);
}
ps.addBatch();
}
ps.executeBatch();
}
}
}
CPU profile generated by async-profile . framegraphs.zip
I'm just speculating, but the possibility is that some kind of optimization that works in batch operation does not work when Types.NULL
is used.
A lot of time is spent in oracle.jdbc.driver.T4CTTIfun.receive()
(to get type info from server?).