It looks like inserting a row which has a string with a backslash at the end causes Spring's splitting of SQL files into statements to behave erratically.

Minimal example.

Setup : Spring Boot test suite with schema.sql and data.sql given below :

schema.sql

    CREATE SCHEMA IF NOT EXISTS TEST;

    CREATE TABLE TEST.ONE (
        IDENTIFIER BIGINT AUTO_INCREMENT PRIMARY KEY,
        NAME VARCHAR(500)
    );

    CREATE TABLE TEST.TWO (
        IDENTIFIER BIGINT AUTO_INCREMENT PRIMARY KEY,
        NAME VARCHAR(500)
    );

data.sql

    INSERT INTO TEST.ONE(IDENTIFIER,NAME) VALUES
        (1,'\1\3\');

    INSERT INTO TEST.TWO(IDENTIFIER,NAME) VALUES
        (1,'test;test');

This fails at my end because Spring splits the second inserted row around the semicolon inside 'test;test'.

The cause is Spring's ScriptUtils class in its splitSQLScript method. It's a method that splits the data.sql script into queries with the semicolon as the default delimiter. Whilst scanning the script, it keeps track of whether or not it's within a statement surrounded by single or double quotes. When it encounters a backslash, it enters a sort of escape mode. The character after this backslash as well as the backslash itself are left as-is, but if the character after the backslash is a single or double quote it's effectively treated as if that quote were to be escaped.

Since I'm using the H2 driver I could fix it via unicode formatting (I wasn't able to make this work with just U&'\1\005c') :

    U&'\1|005c' UESCAPE '|'

Comment From: sdeleuze

Could you please let us know what database are you using and check if it accepts such SQL file when executed without going via Spring? What we could like to check here is if '\1\3\' is parsed as a valid string.

Comment From: sbrannen

Following up on what @sdeleuze said, based on my understanding of the H2 grammar for strings, the proper way to write that first string is '\\1\\3\\'.

Can you verify if that works for you?

Comment From: vdkhvb

Hey, thanks for the feedback.

Adding double backslashes doesn't work, they're inserted as-is (I can add '\1\\' to the DB and it will print out that exact string when fetching the row). H2 does not treat the backslash as an escape char, so I can add '\1\sometext' just fine.

I debugged what happens and added a description of the cause of the issue in my original post. It seems like the method in Spring (which is used to split the data.sql SQL statements) handles backslashes in a way I find weird : it doesn't escape anything preceded by a backslash, but if it finds a single/double quote right after a backslash it stays in the inSingleQuote or inDoubleQuote mode which causes a sort of virtual escaping of those quotes. This causes issues with the splitting when a delimiter (semicolon) is used inside a string in some other insert later down the line. I'm assuming it is not intended behaviour.

Comment From: sbrannen

Hi @vdkhvb,

Thanks for trying out the double backslashes and reporting back with your findings.

Regarding your solution, I assume you meant U&'\1\3|005c' UESCAPE '|' to end up with \1\3\ in the database.

I wasn't able to make this work with just U&'\1\005c'

That was almost correct. You can actually achieve your goal without changing the unicode escape character (UESCAPE '|') by using double backslashes everywhere you want a single backslash in the result.

INSERT INTO TEST.ONE(IDENTIFIER, NAME) VALUES (1, U&'\\1\\3\005c'); -- Unicode 005c is a backslash "\"

At this time, I would consider that the best workaround for this issue.

Please try that out and let us know if that works for you.


As a side note, I noticed that you originally asked about this on Stack Overflow.

For future reference, it's often good to provide a link to such Stack Overflow questions (or other external discussions) in order to provide a cross reference and additional context.

Comment From: sbrannen

it doesn't escape anything preceded by a backslash, but if it finds a single/double quote right after a backslash it stays in the inSingleQuote or inDoubleQuote mode which causes a sort of virtual escaping of those quotes. This causes issues with the splitting when a delimiter (semicolon) is used inside a string in some other insert later down the line. I'm assuming it is not intended behaviour.

The behavior is actually intentional and was originally added for \x style escaping support in MySQL.

A number of databases support backslash as an escape character; however, as you've mentioned, H2 database does not (unless you use the Unicode string literal feature).

Interestingly enough, you are the first person to report this issue. So I suppose this doesn't arise very often.

However, having said that, the current behavior can definitely be improved.

The crux of the issue is that the "MySQL style escapes" support is hard coded in both implementations of ScriptUtils (for JDBC as well as for R2DBC).

https://github.com/spring-projects/spring-framework/blob/3431b2330a7f210015baaa2dfdf5c7fdc9ea400e/spring-jdbc/src/main/java/org/springframework/jdbc/datasource/init/ScriptUtils.java#L469-L473

https://github.com/spring-projects/spring-framework/blob/3431b2330a7f210015baaa2dfdf5c7fdc9ea400e/spring-r2dbc/src/main/java/org/springframework/r2dbc/connection/init/ScriptUtils.java#L380-L384

Thus, there is currently no way to disable this feature, which is in stark contrast to other features such as statement separators, comment prefixes, etc.

As a proof of concept, I commented out that inEscape code, and your original SQL functions as-is (which was to be expected).

So I think the only real way to address this is to make the check for "MySQL style escapes" an optional feature -- controlled by a new boolean flag which defaults to true for backward compatibility.

The problem with adding "yet another configuration property" for SQL script parsing is that the method signature of methods like ScriptUtils.splitSqlScript(EncodedResource, String, String, String[], String, String, List<String>) grow out of control over time. To address that, we have considered introducing a Parameter Object (design pattern) such as SqlScriptConfig, ScriptParserConfig, or similar.

Since we have collectively determined that a viable workaround exists with the current code base, I am assigning this to Spring Framework 6.1.x for the time being in order to investigate how best to introduce the additional configuration flag for script parsing.

Comment From: sbrannen

sql INSERT INTO TEST.ONE(IDENTIFIER, NAME) VALUES (1, U&'\\1\\3\005c'); -- Unicode 005c is a backslash "\"

At this time, I would consider that the best workaround for this issue.

I retract that. 😇

There's actually no need to encode the last backslash with Unicode. The simplest workaround is to follow my original proposal ('\\1\\3\\') within a Unicode string literal (since H2 will convert \\ to \ for you).

In light of that, I would consider the following the best workaround for this issue.

INSERT INTO TEST.ONE(IDENTIFIER, NAME) VALUES (1, U&'\\1\\3\\');

Comment From: vdkhvb

Thanks a lot for taking the time to look at this @sbrannen, and thanks for the clarifications. Your fix works just fine for writing out '\1\3\ to my database so my problem has been resolved.

It is very much an edge case so I presume virtually nobody has had issues with it so far. The original data.sql files I had been using previously for tests of a variety of applications contained an even number of rows with strings ending with a backslash. They typically didn't contain any insert with a semicolon inside a string either. And this hid the problem, though behind the scenes the splitting function split the SQL statements incorrectly (it would treat several statements as one).

Other hacky fixes I used (before updating the H2 driver and using Unicode) include - changing the delimiter - inserting '\1\3\ ' (with a space at the end) into the database followed by an UPDATE statement with the TRIM function to remove said space