Skip to content

Conversation

robTheBuildr
Copy link
Collaborator

@robTheBuildr robTheBuildr commented Sep 23, 2025

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes a deadlock in deleting a chunk and forces a delete prior to redownloading.
This avoids timeouts in cases where there is a deadlock.

Successful runs show a dip in performance around the point deadlock would occur, but future deadlock points don't result in the same dip.
Screenshot 2025-09-23 at 7 03 37 PM

Before adding skip_lock to _apply_delete debug logs showed:

Skip delete /cache/chunks/efbefe0228a2074e53fb8a67da79bd8b/1755713448.3693707/chunk-44-17.bin by 0, current lock count: 3
Requested force download for /cache/chunks/efbefe0228a2074e53fb8a67da79bd8b/1755713448.3693707/chunk-44-17.bin by None

Followed by:

FileNotFoundError: The /cache/chunks/efbefe0228a2074e53fb8a67da79bd8b/1755713448.3693707/chunk-44-17.bin hasn't been found.

What this means is:

  1. The file is not found
  2. After 30s a force redownload is requested
  3. This redownload doesn't succeed
  4. The while loop looking for the file eventually times out and crashes the pipeline

When adding in the delete we see that it is skipped because of a deadlock situation with 3 workers wanting the file.
After forcing a delete, the download to the cache operates properly and a pipeline timeout is avoided.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Copy link

codecov bot commented Sep 23, 2025

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 84%. Comparing base (df92bf8) to head (6d794ff).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #721   +/-   ##
===================================
- Coverage    84%    84%   -0%     
===================================
  Files        52     52           
  Lines      7278   7282    +4     
===================================
- Hits       6126   6124    -2     
- Misses     1152   1158    +6     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@robTheBuildr robTheBuildr enabled auto-merge (squash) September 24, 2025 03:06
@robTheBuildr robTheBuildr changed the title Fix: Force delete prior to force download WIP: Fix: Force delete prior to force download Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant