-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add a new codec to implement OSQ for 4 and 8 bit quantized vectors #15169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ve an even length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Thank you for going through the slog. Doing format changes is always a challenge and requires a ton of ceremony.
I will need to read through this a couple of times to grok all of it (but most of it seems pretty standard for a knn format)
I realize we support unsigned 8 byte now, but for BWC, I would still like to provide 7 bit if possible.
With this change, we should also move all existing quantized formats to BWC. But that would be yet another thousand+ LOC. So, maybe that can be done in a follow up.
...e/core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorScorer.java
Outdated
Show resolved
Hide resolved
.../core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsFormat.java
Outdated
Show resolved
Hide resolved
.../core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsWriter.java
Show resolved
Hide resolved
It is interesting to me that I wonder why that is? Is it simply because the scoring quality is higher, and thus we can exit searching the graph more quickly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, we should also move all existing quantized formats to BWC. But that would be yet another thousand+ LOC. So, maybe that can be done in a follow up.
Marked the old codecs as deprecated. I'd prefer not to do backwards codecs here, this change is already much larger than I'd like but I couldn't figure out how to factor it into smaller pieces.
It is interesting to me that int4 in the new format has such better latency as well!
I wonder why that is? Is it simply because the scoring quality is higher, and thus we can exit searching the graph more quickly?
I re-ran the luceneutil changes because I guess it had partially finished last time. Indexing performance is also much better. I think of this in terms of lossy score compression -- for example trivial bit quantization and hamming distance produces just (dimensions + 1) possible scores, so it can be difficult to distinguish between results in some cases. It could be that the old quantizer produces fewer vector representations and output scores.
More generally I think it would be useful to track and surface KnnCollector
stats in indexing and search paths. Being able to distinguish between "comparisons are faster" and "comparisons are fewer" would be helpful for analyzing this, and also other algorithmic and data layout changes (would be really curious to see this for binary partitioning reordering).
.../core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsFormat.java
Outdated
Show resolved
Hide resolved
Yeah, I am cool with that. Format changes are always huge!
We don't really track this during indexing for sure. But you can expose the vector comparisons in Lucene Util. |
Average visited count in the query path actually is exposed in luceneutil today, it just appears in the iteration summary and not the overall summary. TIL. I've extracted it for my most recent run here:
4 bit is doing ~10% more comparisons than 8 bit for the same fanout. More work in 4 bit but doesn't explain the size of the win in OSQ4. Workload CPU usage and latency is mostly driven scoring costs so let's start by looking at microbenchmark results for dot product on the same hardware:
At 702 dimensions (next closest to our test data set) half byte is about twice as fast. The performance different between -4 and -8 makes sense with this context. I don't know why 4 is so slow, these numbers suggest it shouldn't be worse than 8 yet somehow it is 🤷. Not sure this is worth figuring out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great stuff.
The existing scalar formats can go to the backwards codecs in a separate PR.
Unlike the existing ScalarQuantizer selects a mode based on an enum to quantize to unsigned bytes or
packed nibbles using the same packing scheme as the existing scalar quantized codec. Seven bits is also
supported for anyone interested in backward compatibility, but this setting is discouraged.
This is separate from
Lucene102BinaryQuantizedVectorsFormat
as we need a larger value to storethe component sum for each vector owing to larger quantized values.
This closes #15064
luceneutil benchmark results. OSQ results are bits -4 and -8.