SpringBoot Add a property to all for Batch JobParameters to enable configuration via environment variables

Currently the only way for a user to provide Spring Batch Job Parameters is via command line args. This requested enhancement is to create a Boot property that accepts JSON to be deserialized into JobParameters. This is needed for the Kubernetes functionality when you attempt to restart a job using the shell entry point on your image. In this use case, there is no way to pass command line arguments to the application and some elements of normal parameters are not friendly to being set as environment variable names (run.id(long)=5 for example). I'd assume that the order of precedence for resolving duplicate JobParameters would follow the normal Boot conventions.

Comment From: philwebb

If I understand the request correctly, we can probably add a field toBatchPropertiesthe update JobLauncherApplicationRunner.launchJobFromProperties to consider both command line-args and any properties.

The only thing that feels a bit odd to me is using serialized JSON for the properties. We currently delegate to JobParametersConverter which takes a Properties input. Does this mean we're going JSON -> Properties -> JobParameters?

I wonder if we can get away with a simpler format? Perhaps a list of ':' delimited items? e.g:

SPRING_BATCH_JOBPARAMETERS="run.id(long):5,foo.bar:test"

Comment From: mminella

@philwebb

If I understand the request correctly, we can probably add a field toBatchPropertiesthe update JobLauncherApplicationRunner.launchJobFromProperties to consider both command line-args and any properties.

You understand the request correctly.

As for the use of JSON for the properties, that really was just following the model of the spring_application_json model where we are passing in all the vars in a single block. I'd prefer consistency over introducing a new format that requires users to understand/Boot needs to be able to parse/etc but I'm open to ideas here.

One other note. We'd like to get this into Boot 2.4 since it is actually impacting a SCDF use case currently. We can do the PR if that helps.

Comment From: philwebb

One other note. We'd like to get this into Boot 2.4 since it is actually impacting a SCDF use case currently. We can do the PR if that helps.

I've reserved this one for a conference event that's happening early Oct. Hopefully someone will take this on for that, if not, I'm sure we can work together to get something in for 2.4.

Comment From: snicoll

If I understand the request correctly, we can probably add a field toBatchPropertiesthe update JobLauncherApplicationRunner.launchJobFromProperties to consider both command line-args and any properties.

I am a bit confused. @mminella approved this but, on the other hand, talks about environment variables. I don't think we should update BatchProperties. That would expose a property in the metadata for this, which would let the user paste raw json in application.properties.

Wouldn't we detect that SPRING_BATCH_JOBPARAMETERS is set and then decode the json if necessary? Being able to specify this via environment variables or system properties but not any other property sources feels a bit inconsistent though.

Comment From: snicoll

OK so I've tested something that exposes a spring.batch.job.parameters as a list of parameter with the usual "key=value" form. The property can be specified as a comma-separated value or indexed fashion, as usual (the latter being mandatory if one of the parameter contains a , somewhere.

Something I didn't catch initially on this issue but a discussion with @mminella revealed. It is important that application arguments still override whatever has been specified using this property. I've pushed a proposal in https://github.com/snicoll/spring-boot/commit/cc9f02bfbc86b63969f23286d639504a7ae0efc4 and welcome any feedback. Thanks!

Comment From: cppwfs

@snicoll I took the JSON approach to this issue. https://github.com/cppwfs/spring-boot/tree/BOOT-23411 .

Comment From: snicoll

Yes, I've read the diff in the PR you've raised @cppwfs. As I've mentioned there, we're not keen to use json for this. Please review the commit I've reference above and let me know if there is any problem. Thank you.

Comment From: cppwfs

@snicoll Thank you for creating this branch. I had a chance to run some of our tests from data flow. The only scenario that I can see that this branch does not cover is when a user has embedded commas in the job param. i.e. spring.batch.job.parameters=foo=bar,baz=boo,goo=ball,yikes. This fails parsing.

Comment From: snicoll

Thanks for reviewing.

The only scenario that I can see that this branch does not cover is when a user has embedded commas in the job param

There is an explicit mention of that use case in the commit that I've shared. It is mapped to a list so we need a separate to split the various elements. If you have a comma, you can specific the parameters with an index (one property per value) rather than a single value with everything. We could also consider changing the separator if that helps.

Comment From: snicoll

Our integration with Spring Batch works so that all Job beans in the context are executed. It is also possible to narrow things down using spring.batch.job.names that takes a list of job names to execute. This behaviour is really strange IMO as it would mean, in the current form, that a single set of application arguments are transmitted as job parameters to all jobs. Sharing the same job parameters for all jobs doesn't sound sane to me and can be quite surprising.

Creating this property makes this inconsistency more visible so I am wondering if that's what we want to do. I don't know if changing the default behaviour (and renaming spring.batch.job.names to spring.batch.job.name) is something we want to do so late.

Flagging for team attention so that we get a chance to discuss this some more.

Comment From: snicoll

We've discussed this and we believe that the above requires us to revisit the current arrangement before we can proceed with adding this feature. Unfortunately, it is too late in the 2.4.x cycle to do that. @mminella happy to chat about the requirement and see if there is something else we can do to help you.

Comment From: umutcann

Hi,

As @cppwfs mentioned above, Spring Cloud Data Flow fails to restart task with shell entry point style.

If this issue won't be solved in short term, I want to focus other entry point style(exec and boot) although those have issue like that.

Could you please notify me, If this issue won't be solved in short term?

Kind regards.

Comment From: wilkinsona

@umutcann Please read the comment immediately above yours. As @snicoll has indicated, addressing this issue requires changes that are larger than there was time for in 2.4.x. It's assigned to the 2.x milestone at the moment so we'll consider it for inclusion in 2.5 which will be released in roughly 6 months time.

Comment From: snicoll

@mminella @cppwfs would it be possible to get a review of this issue (in particular this comment) and let us know if something is still needed?

Comment From: mminella

cc: @benas

Comment From: mminella

@snicoll I think this issue still makes sense. I'll defer to @cppwfs on if it is a blocker for an SCDF use case.

To reply to that specific comment, that issue already exists today. If you have a Boot uber jar with three Job beans defined in its ApplicationContext and you pass job parameters at the command line, those job parameters will be passed to all of the Jobs in that context. We actively recommend packaging only one Job per Boot jar but we do know that users combine multiple for many reasons.

Given that JobParameters are accessed via a pull model (a Job will only grab the ones it needs during execution), I'd argue that we should separate these two issues:

The ability to pass parameters via properties (this issue)
The ability to specify what Job gets each job parameter (the issue referenced in that comment).

Comment From: cppwfs

Hello @snicoll, I agree with Michael, that your recommendation is actually 2 issues. The priority is the ability to pass job parameters via properties.
It is a blocker in that a user can't pass job parameters if they use a shell entry point and they have corporate policies that prevent them from using anything but a shell entrypoint .

Comment From: wilkinsona

If we implement 1 (the ability to pass parameters via properties) while ignoring 2 (the need to specify which Job gets each job parameter), we're likely to need to make breaking changes to 1 when we subsequently tackle 2. We already know that passing parameters via the command line is broken in this regard. It would seem foolish to make the same mistake again when adding support for providing job parameters via the environment.

It seems to me that we have two ways forward here:

Simplify things by only running a single job
Continue to support running multiple jobs and allow parameters to be specified on a per-job basis

Option 1 would breaking things for the users that are combining multiple jobs for many reasons but would allow property-based parameter support to be implemented relatively easily. Option 2 would seem to require us to take a step back and agree on how to provide parameters in a per-job basis and then add the environment as input to that as a second step.

Comment From: mminella

While not our recommendation, 1 would be a jaring change for many users. I know it is a common question about how to package multiple jobs into a single artifact and run only the one that matters at that time (they wish to avoid having an artifact per job).

I would vote for option two if we are going to address both issues at once.

Comment From: wilkinsona

1 would be a jaring change for many users. I know it is a common question about how to package multiple jobs into a single artifact and run only the one that matters at that time (they wish to avoid having an artifact per job).

There's a difference between packaging multiple jobs into a single artifact and running multiple jobs at the same time. If the goal is to avoid having one artifact per job, we could support that while also requiring that we only execute one job. If the context only contains a single job, we execute it. If the context contains multiple jobs, we require that a property be used to identify the job to execute. This is what @snicoll described above. We'd rename spring.batch.job.names to spring.batch.job.name and it would identify the one and only job to execute.

Before we can make progress on this, I think the first thing we need to decide is whether or not we want to continue to support executing multiple jobs at the same time. That's the default behaviour at the moment and we also have a property that can be used to specify the jobs that are executed if you don't want to execute all of them. Users that package multiple jobs in the same artifact but only execute one of them would be unaffected by this, other than a possible change to the property name used to identify the job to execute. Users that package multiple jobs in the same artifact and execute all of them would be adversely affected as they'd either have to start the same artifact n times or split it into n different artifacts and start each of them.

Comment From: wilkinsona

Before we can make progress on this, I think the first thing we need to decide is whether or not we want to continue to support executing multiple jobs at the same time. That's the default behaviour at the moment and we also have a property that can be used to specify the jobs that are executed if you don't want to execute all of them. Users that package multiple jobs in the same artifact but only execute one of them would be unaffected by this, other than a possible change to the property name used to identify the job to execute. Users that package multiple jobs in the same artifact and execute all of them would be adversely affected as they'd either have to start the same artifact n times or split it into n different artifacts and start each of them.

@mminella @benas what's your opinion on this please?

Comment From: fmbenhassine

@wilkinsona Looking into this now. There are a lot of details discussed here, so I will check and get back to you asap.

Comment From: fmbenhassine

I think the first thing we need to decide is whether or not we want to continue to support executing multiple jobs at the same time

I will let the boot team decide on that, but I would not continue the support for running multiple jobs at the same time, because this is problematic in many ways: Same arguments are passed to all jobs (as mentioned by Stephane), mixed logs, ambiguous job execution order(ing), confusing exit code, etc. I can elaborate on each of those, but I believe there are obvious reasons why the current behaviour is confusing. So following the unix philosophy of making one thing do one thing and do it well, I would change the current behaviour to run a single job at a time.

Now regarding the initial feature request of adding support to pass job parameters through environment variables, I am not sure I fully understand the problem that led to this request. Here is what I understand from the original issue (https://github.com/spring-cloud/spring-cloud-dataflow/issues/4119): SCDF tries to restart a failed job execution with a command similar to java -jar myjob.jar run.id(long)=5 where the entry point style of the task is of type shell. In this configuration, SCDF passes all application properties and command line arguments as environment variables (and transforms . to _). This means the run.id(long)=5 argument is transformed to an environment variable RUN_ID(LONG)=5 and this is when the failure happens because ( is an illegal character in an environment variable name. Now here is where I'm not fully clear with this issue:

Even if we add support to pass job parameters through environment variables, how would this fix the initial issue? It is not even possible to set an environment variable like RUN_ID(LONG)=5 for the same reason. I really need to see an example to understand the expected behaviour from boot. Moreover, if we decide to add support for that in boot, we need to add it in batch as well for feature parity with non-boot users.
If this Spring Batch specific notation of parameterName(ParameterType)=parameterValue is not friendly to being set as environment variable, then this should be fixed in batch. Note that this notation comes from the DefaultJobParametersConverter which could be customized/overridden if needed to support other characters. So unless I'm missing something, the initial issue could have been fixed with a custom JobParametersConverter.

That's why I would like to fully understand the problem before thinking of or proposing a solution. FTR, we have a meeting planned this week with @cppwfs to discuss the matter, and I would like someone from the boot team to join.

As a side note, this shell entry point form should not be recommended in SCDF (and even outside SCDF) because in this form, the process executed in the container is run as a subcommand of /bin/sh -c and hence will not receive unix signals. This is problematic if we want to ensure a graceful shutdown of Spring Boot apps, and more specifically Spring Batch apps where a graceful shutdown is crucial to be able to restart jobs automatically (and avoid having to update the job repository manually).

Comment From: cppwfs

Hello All,

I agree with @benas in that it is a better standard that we only run one job per application launch. A user may have more than one job per app, however they have to select the job name they want to run. Something like spring.batch.job.name=foo. If they choose to run an app that has multiple jobs and don't select a specific job, JobApplicationRunner will thrown an exception.
To the original issue we discussed using a property that would promote certain environment variables to be job params. Something like: spring.batch.job.params=foo,bar,baz

Thoughts?

Comment From: wilkinsona

To the original issue we discussed using a property that would promote certain environment variables to be job params. Something like: spring.batch.job.params=foo,bar,baz

Based on what @benas has said above, I don't understand how this will help as RUN_ID(LONG) isn't a legal name for an environment variable.

That's why I would like to fully understand the problem before thinking of or proposing a solution. FTR, we have a meeting planned this week with @cppwfs to discuss the matter, and I would like someone from the boot team to join.

When is this meeting scheduled?

Comment From: cppwfs

Hello @wilkinsona , A user who wants to launch boot/batch jobs on kubernetes is currently using containers that contain a shell application that launches the boot/batch application. This shell app does not pass command line args to the application, it expects all properties to be set via the environment variables. Via SCDF you can specify a entry_point of shell that will convert all command-line-args to environment variables so that this information is not lost (this is meant to be used for docker containers with shell entry_point). However in kubernetes you can't use () as a key for environment variables.
But this initial issue raised a bigger point, how does a user set job-parameters for a boot/batch job if they can't pass in command line args for those cases where a docker container is using the shell entry-point? Or in this user's case they are using a shell app to launch their boot/batch app and those args are not passed to the boot/batch application.

We can schedule another meeting with you on this topic.

Comment From: wilkinsona

As just discussed with @cppwfs and @benas, we don't think that Boot is the right place to solve this as it's a more general problem.

SCDF is responsible for mapping command line arguments to environment variables in situations where the entry point cannot consume command line arguments. When the common line argument's name contains characters that are illegal in an environment variable's name, it performs a mapping to something that is legal. It then needs to perform the reverse of this mapping so that the transformed environment variables are turned back into their original command line arguments and made available to the application.

We also discussed the possibility of Batch being able to consume job parameters from sources other than the command line. This would be a new Batch feature that @benas is happy to consider should there be sufficient demand. My feeling was that if sources other than the command line are going to be considered, using Spring Framework's Environment could be a good option providing there was a convention for identifying entries in the Environment that are to be treated as job parameters.

Comment From: wilkinsona

I've opened https://github.com/spring-projects/spring-boot/issues/25373 so that we can deprecate support for running multiple jobs.