Skip to content

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Sep 4, 2025

A merge that throws an exception causes the closing of the IndexWriter, which in turn aborts running merges and closes the ThreadPoolMergeScheduler in the same merge thread.

Before this change, ThreadPoolMergeScheduler#close would use a CountDownLatch to wait for the signal that all merges have been aborted/completed. But closing of the merge scheduler is executed in a merge thread that is not yet completed at the time it waits on the latch, causing a deadlock.

The proposed fix in this change uses a mechanism similar to what ConcurrentMergeScheduler#sync does, ie waits on all merge threads to be aborted/completed except the current one.

The proposed test works when ThreadPoolMergeScheduler is enabled or not. I'd like to add a similar test in serverless too, just to be sure it works everywhere.

Relates ES-12664

@tlrx tlrx added >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v9.2.0 v9.1.4 labels Sep 4, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Sep 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine
Copy link
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

@tlrx tlrx marked this pull request as draft September 4, 2025 13:06
@tlrx
Copy link
Member Author

tlrx commented Sep 22, 2025

Closed in favor of #134656

@tlrx tlrx closed this Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.5 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants