-
Notifications
You must be signed in to change notification settings - Fork 157
fix(dedupe): use a common blobs dir to simplify #3314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Should pair with #2968 |
6edb9fb to
b4c3c46
Compare
| DefaultGCInterval = 1 * time.Hour | ||
| S3StorageDriverName = "s3" | ||
| LocalStorageDriverName = "local" | ||
| GlobalBlobsRepo = "_blobstore" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this blobs repo is per store and substore. Because substores may be on different partitions.
4b2da37 to
da2884a
Compare
49521a6 to
23f1ebc
Compare
Currently, zot uses one of the repos as the master copy for a blob to achieve dedupe. However, blobs can be deleted from repos and this complicates dedupe tracking logic. Now use a single hidden global repo as a blob store instead. Signed-off-by: Ramkumar Chinchani <[email protected]>
Signed-off-by: Ramkumar Chinchani <[email protected]>
Signed-off-by: Ramkumar Chinchani <[email protected]>
Signed-off-by: Ramkumar Chinchani <[email protected]>
Signed-off-by: Ramkumar Chinchani <[email protected]>
Signed-off-by: Ramkumar Chinchani <[email protected]>
Signed-off-by: Ramkumar Chinchani <[email protected]>
Signed-off-by: Ramkumar Chinchani <[email protected]>
andaaron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logging is too verbose, and should be trimmed
| substore := c.StoreController.SubStore[route] | ||
| if substore != nil { | ||
| substore.RunDedupeBlobs(time.Duration(0), c.taskScheduler) | ||
| //substore.RunDedupeBlobs(time.Duration(0), c.taskScheduler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just temporary, or we're dropping dedupe altogether?
| return nil | ||
| } | ||
|
|
||
| // create nested deduped bucket where we store all the deduped blobs + original blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // create nested deduped bucket where we store all the deduped blobs + original blob | |
| // create nested deduped bucket where we store all the deduped blobs while excluding the original blob |
Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logging is too verbose for merge. It would be too much information to include logs for the code paths not returning errors.
|
|
||
| dedupeBlob := d.getOne(deduped) | ||
| if dedupeBlob != nil { | ||
| d.log.Debug().Str("digest", digest.String()).Str("path", path).Msg("more in dedupe bucket, leaving original alone") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous logic was replacing the old original blob with one of the duplicates, and then delete the blob.
This new logic would keep the original blob around forever? How/when would it be GCed?
| } | ||
|
|
||
| return nil | ||
| return zerr.ErrCacheMiss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under which scenario would this return the error?
The updated code is a bit confusing as it is returning nil in several other places and it is unclear in which scenario this line is reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Are there no other needed changes in this file?
- Also the redis implementation should be reviewed
| } | ||
|
|
||
| origin := bucket.Bucket([]byte(constants.OriginalBucket)) | ||
| if origin != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if origin is nil, then we should return nil, because the blob does not exist.
It is not in the duplicates list and it is not the original, so it does not exist, so the deletion should be marked as successful, correct?
| } | ||
|
|
||
| // skip the global blobs repo | ||
| if repo == constants.GlobalBlobsRepo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is fishy.
Why is this here? constants.GlobalBlobsRepo is not an actual repo because it has an invalid repo name. How is GC detecting it when "walking" the disk if it doesn't have a valid repo name? It shouldn't call this function at all for constants.GlobalBlobsRepo?
|
|
||
| return err | ||
| } | ||
| goto retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're refactoring this, can you pick up #2969?
Or do you want me to rebase and merge that in advance?
Is this code working right now, wouldn't it potentially execute the code multiple times?
if err := is.storeDriver.Move(src, gdst); err != nil {
is.log.Error().Err(err).Str("src", src).Str("dst", gdst).Str("component", "dedupe").
Msg("failed to rename blob")
return err
}
Currently, our dedupe scheme is very complicated since master copy is kept in one of the repo dirs and if that image is deleted, the owning repo is changed, requiring multiple repo locks causing unnecessary contention.
What type of PR is this?
Which issue does this PR fix:
What does this PR do / Why do we need it:
If an issue # is not available please add repro steps and logs showing the issue:
Testing done on this change:
Automation added to e2e:
Will this break upgrades or downgrades?
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.