-
Notifications
You must be signed in to change notification settings - Fork 721
Some optimizations in ResolveAll #24834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🟢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the ResolveAll function by improving the Fit function's performance through pre-compilation of selector data structures. The change reduces time complexity from O(L * (logI + logN + M)) to O((I + N) * M) where L is the total label count, I and N are "in" and "not in" selector counts respectively.
- Refactored the Fit function to use pre-compiled selector indices instead of map lookups
- Added hash template specializations for TLabel and TVector types
- Introduced TCompiledSelector structure to cache selector mappings with label indices
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
ydb/library/yaml_config/public/yaml_config.h | Added hash.h include for new hash map functionality |
ydb/library/yaml_config/public/yaml_config.cpp | Implemented optimization with pre-compiled selectors and updated Fit function signature |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
THashMap<TString, int> nameToIndex; | ||
nameToIndex.reserve(labelNames.size()); | ||
for (size_t i = 0; i < labelNames.size(); ++i) { | ||
nameToIndex[labelNames[i]] = static_cast<int>(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using static_cast for size_t to int conversion could cause overflow for very large indices. Consider using a bounds check or changing the data structure to use size_t consistently.
Copilot uses AI. Check for mistakes.
for (size_t i = 0; i < notIn.size(); ++i) { | ||
int idx = notIn[i].first; | ||
const auto& label = labels[idx]; | ||
if (label.Type != TLabel::EType::Negative && notIn[i].second->Values.contains(label.Value)) { | ||
return false; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bounds checking for idx against labels.size() is missing. If nameToIndex contains invalid indices, this could lead to out-of-bounds access.
Copilot uses AI. Check for mistakes.
for (size_t i = 0; i < in.size(); ++i) { | ||
int idx = in[i].first; | ||
const auto& label = labels[idx]; | ||
if (label.Type == TLabel::EType::Negative) { | ||
return false; | ||
} | ||
if (!in[i].second->Values.contains(label.Value)) { | ||
return false; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bounds checking for idx against labels.size() is missing. If nameToIndex contains invalid indices, this could lead to out-of-bounds access.
Copilot uses AI. Check for mistakes.
⚪ Test history | Ya make output | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪ Test history | Ya make output | Test bloat
⚪ Test history | Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
Changelog entry
Some optimizations in ResolveAll
Let:
L - count of labels
I - count of labels with "in" in selectors
N - count of labels with "not in" in selectors
F - average time for Value.Contains() for
TSet<TString>
Old Fit gave us: O(L * (logI + logN + F), new Fit will give us O((I + N) * F). (I + N) always less than L (obviously)
Changelog category
Description for reviewers
...