Skip to content

Wrong job completed on multi job run with remote partitioningΒ #4945

@martincosmobishop

Description

@martincosmobishop

Bug description
In a remote partitioning job configuration, if you launch two job instances (of the same Job) at the same time, the first having a heavy task and the second having a very fast task, when the second job's task is finished it is the first job that is marked as completed.

Environment
Spring Batch version : 5.2.1
Java version : 21

Steps to reproduce
Worker : For this example we will use two workers with a single java thread consuming the events each. We will call then Worker1 and Worker2.
Job : The first job will be called Job1 and the second Job2.
Partitions : For this example each job will be partitioned in only one StepExecution. The partitioned (StepExecution) task will be called Task1_1 (first 1 for the job number, and the second 1 is for the number of the task). The Task1_1 takes 100s to execute while Task2_1 takes only one second.

Here goes:
T0 : The job Job1 is launched. The Task1_1 is sent into the outputChannel (MessageChannelPartitionHandler.doHandle) and it is the first to connect to the replyChannel (MessageChannelPartitionHandler.receiveReplies)

T0 + 1s : The task Task1_1 is consumed by the worker Worker1 and it will take 100s to finish it

T0 + 2s : The job Job2 is launched. The task Task2_1 is sent into the outputChannel. The job's java execution is the second to connect to the replyChannel

T0 + 3s : The task Task2_1 is consumed by the worker Worker2 and it will take 1s to finish it

T0 + 4s : The task Task2_1 is finished and the task Reply_Task2_1 is put in the inputChannel

T0 + 5s : The task Task Reply_Task2_1 is read, and goes through the AbstractMessageChannel, AbstractMessageHandler, AbstractCorrelatingMessageHandler, SimpleMessageStore java classes. As it is the only task (partition) of the job Job2, it is grouped and a message sent to the in memory queue replyChannel (MessageChannelPartitionHandler.replyChannel).

T0 + 6s : The job Job1's execution, waiting in the replyChannel, being the first to connect to it receives a message. Only it is the wrong one. It is the Job2's message. :((

I haven't found a way to configure the replyChannel to be split/sharded by the correlationId or to set a custom replyChannel for each job execution.

Expected behavior
The Job2 receives the task completion event

Minimal Complete Reproducible example

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions