-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Bug description
In a remote partitioning job configuration, if you launch two job instances (of the same Job) at the same time, the first having a heavy task and the second having a very fast task, when the second job's task is finished it is the first job that is marked as completed.
Environment
Spring Batch version : 5.2.1
Java version : 21
Steps to reproduce
Worker : For this example we will use two workers with a single java thread consuming the events each. We will call then Worker1 and Worker2.
Job : The first job will be called Job1 and the second Job2.
Partitions : For this example each job will be partitioned in only one StepExecution. The partitioned (StepExecution) task will be called Task1_1 (first 1 for the job number, and the second 1 is for the number of the task). The Task1_1 takes 100s to execute while Task2_1 takes only one second.
Here goes:
T0 : The job Job1 is launched. The Task1_1 is sent into the outputChannel (MessageChannelPartitionHandler.doHandle) and it is the first to connect to the replyChannel (MessageChannelPartitionHandler.receiveReplies)
T0 + 1s : The task Task1_1 is consumed by the worker Worker1 and it will take 100s to finish it
T0 + 2s : The job Job2 is launched. The task Task2_1 is sent into the outputChannel. The job's java execution is the second to connect to the replyChannel
T0 + 3s : The task Task2_1 is consumed by the worker Worker2 and it will take 1s to finish it
T0 + 4s : The task Task2_1 is finished and the task Reply_Task2_1 is put in the inputChannel
T0 + 5s : The task Task Reply_Task2_1 is read, and goes through the AbstractMessageChannel, AbstractMessageHandler, AbstractCorrelatingMessageHandler, SimpleMessageStore java classes. As it is the only task (partition) of the job Job2, it is grouped and a message sent to the in memory queue replyChannel (MessageChannelPartitionHandler.replyChannel).
T0 + 6s : The job Job1's execution, waiting in the replyChannel, being the first to connect to it receives a message. Only it is the wrong one. It is the Job2's message. :((
I haven't found a way to configure the replyChannel to be split/sharded by the correlationId or to set a custom replyChannel for each job execution.
Expected behavior
The Job2 receives the task completion event
Minimal Complete Reproducible example