-
Notifications
You must be signed in to change notification settings - Fork 1.3k
FirstPassGroupingCollector supports ignoring docs without group field #15167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Binlong Gao <[email protected]>
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Binlong Gao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some great numbers! I have some minor comments.
lucene/grouping/src/java/org/apache/lucene/search/grouping/FirstPassGroupingCollector.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Binlong Gao <[email protected]>
I am not 100% familiar with this part of the code base, but it seems like a pretty safe change, and tests prove it out. I will leave this PR open for a little bit for others to comment if they find issue with it. I will likely merge & backport this next week. |
Should this use the return value of |
That doesn't work, ValueSourceGroupSelector.advanceTo() always return ACCEPT: |
That's the current behavior of |
By checking the usage, the return value is designed to be used in second pass grouping selector, not for the FirstPassGroupingCollector. |
Signed-off-by: Binlong Gao <[email protected]>
I agree with @alexmm-amzn, I think this needs to use the return value of This also reminds me that at some point I should get round to deprecating |
Description
Resolve a TODO in FirstPassGroupingCollector, introduce a new parameter
ignoreDocsWithoutGroupField
, if it's true, ignore documents that don't have the group field instead of putting them into a null group.This change can also improve the field collapsing performance, I did some benchmark test against OpenSearch, here are the result: