Hi, I posted before but my issue closed. I have a (somewhat) simple spring boot/spring batch application that loads files from a .CSV and transforms/maps that data and persists to a Postgresql database. It was running perfectly fine on spring-boot-starter 1.5.9.RELEASE. However, upon upgrading to anything > 2.2.0 spring-boot-starter, I noticed the any repository calls to the database (postgres) will not work. The application will be in a "Running" state on port 8080, but it will hang/freeze/stick on the JpaRepository method, in this case, I am using the .saveAll() for my entity.

The interesting thing is both 2.2.0 and 2.2.1 use spring-batch-core 4.2.0.RELEASE.

Another objective for myself was to try and get on spring-batch-core 4.3.2.RELEASE, which was a problem , since I could only get to spring-batch-starter 2.2.0 as started above. However, I was able to manually spring-batch-core from the spring-batch-starter and manually import 4.3.2.RELEASE, even though it failed, however I found a stackoverflow post stating there was a dependency on jackson-databind 2.11.0 (spring 2.2.0 uses 2.10.0): https://stackoverflow.com/questions/65607909/spring-batch-2-4-1-wildfly-20-final-java-lang-nosuchfielderror-block-unsafe/65795467#65795467

Luckily this worked. Without the jackson databind 2.11.0 , my first JpaRepository method call .saveAll() just jumps into SpringAOP/TX framework/library classes and appears to jump into an infinite loop. The program appears running to the client (never crashes) but never progresses forward in the code logic (and obviously never persists/saves data to database). I'm not sure if this a clue Jackson could be a potential problem here. I'm still perplexed why >= 2.2.1 is failing to persist for my spring batch application. Are there any advices I could get how to debug this?

Comment From: wilkinsona

Thanks for the report. Unfortunately, I'm not sure that I've followed what you're trying to do. You've mentioned JpaRepository which is part of Spring Data JPA, but the issue suggests that you believe there's a problem with Spring Batch. I don't see the connection between the two from what you've described thus far.

If you would like us to spend some more time investigating, please spend some time providing a complete yet minimal sample that reproduces the problem using Spring Boot 2.3.x or later. You can share it with us by pushing it to a separate repository on GitHub or by zipping it up and attaching it to this issue.

Comment From: alpizano

Hey wilkinsona, sorry for the ambiguity. I do indeed have a Spring Batch application that is reading .csv files and transforming the data then persisting it to a postgres database, so it indeed is using spring data jpa.

I did post this on the spring-boot github and @benas suggested I post this for spring boot. There must be something in the upgrade to 2.2.1 that does not play well with spring-batch and persistence.

I was following the upgrade/migration guide per spring because I was stuck on this issue and noticed spring 2.2.0 was the highest version I could get before persistence broke. I'm not sure if its an issue with spring-batch or spring-jpa because I defined the data source to the Database and the JobRepository in spring-batch uses this, and I saw other people having similiar persistence issues on stackoverflow when using spring-batch + spring-jpa. Most of them suggested declaring a JpaTransactionManager, but that solution never worked for me.

I can definitely whip up a MVP of this issue and push it to github.

Comment From: fmbenhassine

To give a bit of context, this issue has been opened against batch here: https://github.com/spring-projects/spring-batch/issues/3887.

Comment From: spring-projects-issues

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Comment From: spring-projects-issues

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

Comment From: alpizano

Hi @benas and @wilkinsona I have recreated the issue here: https://github.com/alpizano/spring-batch-data-jpa-persistence-issue-mvp

I made it as succinct as possible and stripped all the necessary code out. I have a profile for H2 embedded database and Postgres, though the H2 was giving some sql grammar exceptions but the POSTGRES profile should be working fine (also my production db is in prostgres so ideally I am trying to replicate that environment).

So , as you will see, there is the @PostConstruct annotation in the DemoApplication.java main method class that triggers the batch job, which is simply reading from the sample-data.csv file in src/main/resources. Normally, I am reading from S3 bucket and using all the Spring AWS jars in my POM.xml, but this is a MVP simplified example of course, and the problem seems to be with the persistence jpa repository methods (I think) and saving, so not any of the aforementioned stuff.

so if you set the VM options as -Dspring.profiles.active=postgres on INTELLIJ and run the main method class DemoApplication, the pom.xml should by default have spring boot starter 2.2.0.RELEASE.

The job will trigger and will persist the data to your local postgres db. My connection is :

dbname: postgres
user: 
password:

so please adjust if you have a password or username or different db you want to use.

Anyway, it should successfully get saved to the Database.

NOW, go ahead and increase the version to 2.2.1.RELEASE and you will see it Freeze or Hang on the "saveAll()" method call.

Comment From: alpizano

Hi everyone, I posted a link to the MVP above , but incase you missed it, here it is: https://github.com/alpizano/spring-batch-data-jpa-persistence-issue-mvp

I was doing more debugging and I found the problem. If I comment out the taskExecutor in the TaskletStep builder in stepToLoadDataFromCsv method in BatchConfiguration.java class.....

    .taskExecutor(taskExecutor) // seems to be problem

how the bean is defined:

@Bean
    @StepScope
    public TaskExecutor taskExecutor() {
        ThreadPoolTaskExecutor threadPoolExecutor = new ThreadPoolTaskExecutor();
        threadPoolExecutor.setCorePoolSize(1);
        threadPoolExecutor.setThreadGroupName("taskExecutor-batch");
        return threadPoolExecutor;
    }

..... the data gets persisted to the database (postgres) and I don't see any of the issues I saw before, even migrating up to spring boot starter 2.5.0. I see per Spring docs it recommends setting up the multi-threaded step using the taskExecutor similiar to how I setup so I'm curious why its causing the issue I am seeing: https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#multithreadedStep

Questions: 1. I am using the @PostConstruct in the DemoApplication class to trigger the job, but we don't normally. As I may have said before, I'm really reading files , millions from a .CSV file that gets put on Amazon S3 bucket. I only provided this super simple MVP to help debugging purposes. I'm wondering if I would even see this issue in production using the AWS listeners and such, and not using this @PostConstruct method to trigger the Spring batch job. UPDATE: 5/25/2021 despite the weird issues with triggering the Spring batch job in the main method class using the @PostConstruct annotation and calling run() method on spring >= 2.2.1, the AWS listener triggering the job works fine, which is my main intent so I am happy man for now. 2. If the taskExecutor really is the issue, How can I keep this multithreaded and replace the TaskExecutor with something else that works?. 3. Why did this TaskExecutor not cause any issues on Spring <= 2.2.0 and seems to work fine, when using @PostConstruct and triggering the Job in main method?

any ideas?

Comment From: alpizano

Hi @wilkinsona @benas so I have confirmed today the probably with the persistence was the way I was using the @PostConstruct annotation in the main method (DemoApplication.java) and triggering the batch job. How I found this out was:

  1. setting up a @RestController and triggering the same method/Job but through a GET request. Even on spring >= 2.2.1 all the way up to spring boot starter 2.4.5, this works flawlessly. Its only when I use the @PostConstruct and call the run method on the Job that I see the weird persistence issue where .saveAll() method that saves entities to db hangs/jumps into some cyclical loop or something in the spring aop library classes (JdkDynamicAopProxy if I recall)
  2. Also, if I keep the @PostConstruct annotation and go say, spring 2.4.5, I CAN get it to work, but I have to remove the TaskExecutor bean from the TaskletStep builder constructor. So I'm assuming there is some weird multithreading issue happening there.

Nevertheless, as I stated, I only do this @PostConstruct to trigger that Batch Job when testing locally. Normally , I read some AWS S3 bucket. And today I found that functionality, despite these local oddities, works fine. So I'm happy. I am curious to better understand why using the same @PostConstruct setup in the main method to trigger the batch job seems to not work on spring >= 2.2.1 however.

Comment From: snicoll

Thank you for the sample.

Having a taskExecutor named like that with @StepScope could be potentially problematic. Spring Framework uses taskExecutor has a well-known name and Spring Boot auto-configures one as of Spring Boot 2.2.x with that name. If one is already configured, Spring Boot backs off but the one you've created with step scope is registered with the application and could lead to problems.

Can you rename taskExecutor to something else ? For instancebatchTaskExecutor would be more suitable considering the scope.

Comment From: alpizano

@snicoll Thanks Stephane. I will rename the taskExecutor. I normally have this "Job" in spring batch triggered by the AWS SDK I use in this project (as stated before I omitted to provide a minimum code sample) which listens to SQS queue then reads .csv from S3 bucket - Here, you can see in DemoApplication.java, I am triggering the job through the @PostConstructor method where I called the run method on the job.

Triggering this way, seems to cause the problems with the TaskExecutor, but I just wanted to stress that If I call the same Job from a @RestController spring endpoint , it works fine. (I can also use the batch enabled = true property in the YAML file which auto-runs the some commandline runner and runs the Job automatically upon booting the Spring application locally, but I think this bypasses the taskExectuor, however this works fine too).

I appreciate you guys taking the time to provide some input!

Comment From: snicoll

Also, confirmed by @benas, @StepScope should not be set on the TaskExecutor at all. Keep in mind that if you name it this way, it will be the general task executor of the app so if you want it to be specific to Spring Batch, it'll have to have another name. I am going to close this issue now as adding @StepScope is incorrect.