Add a new codec to implement OSQ for 4 and 8 bit quantized vectors #15169

mccullocht · 2025-09-09T16:24:13Z

Unlike the existing ScalarQuantizer selects a mode based on an enum to quantize to unsigned bytes or
packed nibbles using the same packing scheme as the existing scalar quantized codec. Seven bits is also
supported for anyone interested in backward compatibility, but this setting is discouraged.

This is separate from Lucene102BinaryQuantizedVectorsFormat as we need a larger value to store
the component sum for each vector owing to larger quantized values.

This closes #15064

luceneutil benchmark results. OSQ results are bits -4 and -8.

Results:
recall  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.875        0.858   0.855        0.996  1000000    10     100       32        250    -4 bits    203.69       4909.49          168.11             1         3349.44      3311.157      381.470       HNSW
 0.954        1.222   1.217        0.996  1000000    10     100       32        250    -8 bits    333.87       2995.20          193.90             1         3717.10      3677.368      747.681       HNSW
 0.450        1.881   1.816        0.965  1000000    10     100       32        250     4 bits    510.04       1960.65          285.58             1         3346.30      3299.713      370.026       HNSW
 0.928        1.245   1.241        0.997  1000000    10     100       32        250     8 bits    325.14       3075.58          207.41             1         3705.70      3665.924      736.237       HNSW

…ve an even length

benwtrent

This is awesome! Thank you for going through the slog. Doing format changes is always a challenge and requires a ton of ceremony.

I will need to read through this a couple of times to grok all of it (but most of it seems pretty standard for a knn format)

I realize we support unsigned 8 byte now, but for BWC, I would still like to provide 7 bit if possible.

With this change, we should also move all existing quantized formats to BWC. But that would be yet another thousand+ LOC. So, maybe that can be done in a follow up.

...e/core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorScorer.java

.../core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsFormat.java

.../core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsWriter.java

benwtrent · 2025-09-13T12:32:13Z

It is interesting to me that int4 in the new format has such better latency as well!

I wonder why that is? Is it simply because the scoring quality is higher, and thus we can exit searching the graph more quickly?

Co-authored-by: Benjamin Trent <[email protected]>

mccullocht

With this change, we should also move all existing quantized formats to BWC. But that would be yet another thousand+ LOC. So, maybe that can be done in a follow up.

Marked the old codecs as deprecated. I'd prefer not to do backwards codecs here, this change is already much larger than I'd like but I couldn't figure out how to factor it into smaller pieces.

It is interesting to me that int4 in the new format has such better latency as well!
I wonder why that is? Is it simply because the scoring quality is higher, and thus we can exit searching the graph more quickly?

I re-ran the luceneutil changes because I guess it had partially finished last time. Indexing performance is also much better. I think of this in terms of lossy score compression -- for example trivial bit quantization and hamming distance produces just (dimensions + 1) possible scores, so it can be difficult to distinguish between results in some cases. It could be that the old quantizer produces fewer vector representations and output scores.

More generally I think it would be useful to track and surface KnnCollector stats in indexing and search paths. Being able to distinguish between "comparisons are faster" and "comparisons are fewer" would be helpful for analyzing this, and also other algorithmic and data layout changes (would be really curious to see this for binary partitioning reordering).

.../core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsFormat.java

benwtrent · 2025-09-15T21:46:54Z

Marked the old codecs as deprecated. I'd prefer not to do backwards codecs here, this change is already much larger than I'd like but I couldn't figure out how to factor it into smaller pieces.

Yeah, I am cool with that. Format changes are always huge!

More generally I think it would be useful to track and surface KnnCollector stats in indexing and search paths. Being able to distinguish between "comparisons are faster" and "comparisons are fewer" would be helpful for analyzing this, and also other algorithmic and data layout changes (would be really curious to see this for binary partitioning reordering).

We don't really track this during indexing for sure. But you can expose the vector comparisons in Lucene Util.

mccullocht · 2025-09-15T22:56:07Z

Average visited count in the query path actually is exposed in luceneutil today, it just appears in the iteration summary and not the overall summary. TIL. I've extracted it for my most recent run here:

4 bit is doing ~10% more comparisons than 8 bit for the same fanout. More work in 4 bit but doesn't explain the size of the win in OSQ4. Workload CPU usage and latency is mostly driven scoring costs so let's start by looking at microbenchmark results for dot product on the same hardware:

VectorUtilBenchmark.binaryDotProductVector         128  thrpt   15   47.741 ±  0.502  ops/us
VectorUtilBenchmark.binaryDotProductVector         256  thrpt   15   26.198 ±  0.608  ops/us
VectorUtilBenchmark.binaryDotProductVector         300  thrpt   15   22.857 ±  0.130  ops/us
VectorUtilBenchmark.binaryDotProductVector         512  thrpt   15   13.864 ±  0.443  ops/us
VectorUtilBenchmark.binaryDotProductVector         702  thrpt   15   10.003 ±  0.205  ops/us
VectorUtilBenchmark.binaryDotProductVector        1024  thrpt   15    6.795 ±  0.098  ops/us
VectorUtilBenchmark.binaryHalfByteVectorPacked     128  thrpt   15   72.590 ±  1.089  ops/us
VectorUtilBenchmark.binaryHalfByteVectorPacked     256  thrpt   15   50.962 ±  0.197  ops/us
VectorUtilBenchmark.binaryHalfByteVectorPacked     300  thrpt   15   39.587 ±  0.159  ops/us
VectorUtilBenchmark.binaryHalfByteVectorPacked     512  thrpt   15   31.660 ±  0.187  ops/us
VectorUtilBenchmark.binaryHalfByteVectorPacked     702  thrpt   15   22.251 ±  0.135  ops/us
VectorUtilBenchmark.binaryHalfByteVectorPacked    1024  thrpt   15   18.117 ±  0.129  ops/us

At 702 dimensions (next closest to our test data set) half byte is about twice as fast. The performance different between -4 and -8 makes sense with this context. I don't know why 4 is so slow, these numbers suggest it shouldn't be worse than 8 yet somehow it is 🤷. Not sure this is worth figuring out.

benwtrent

Really great stuff.

The existing scalar formats can go to the backwards codecs in a separate PR.

mccullocht added 10 commits September 5, 2025 12:36

vector scorer

18f7291

offheap vv

149965a

writer

aedce9c

reader

89a4228

wrap up and test flat vector format for sq8

a55fee9

hnsw codec

b7abe2d

enum for scalar encoding

7cec9e8

fix most of the write path

7b76f3d

packing without testing

cf4fdef

flat vectors test

132c8ee

github-actions bot added the module:core/codecs label Sep 9, 2025

fix license

e90b6d1

benwtrent self-requested a review September 9, 2025 16:50

benwtrent added this to the 10.4.0 milestone Sep 9, 2025

mccullocht added 8 commits September 9, 2025 10:08

CHANGES

f9cc396

handle boundary cases with nibble encoding -- unpacked must always ha…

e6e6b6a

…ve an even length

resilience to small floating point errors

01a9748

tidy--

2e5f89d

remove unnecessary default

b30731c

tidy

bb89c01

unpack bytes during updateable scoring

11a978a

Merge remote-tracking branch 'origin/main' into sq-to-osq

d8ad448

mccullocht marked this pull request as ready for review September 10, 2025 20:27

Merge remote-tracking branch 'origin/main' into sq-to-osq

a9720ca

benwtrent reviewed Sep 13, 2025

View reviewed changes

mccullocht and others added 4 commits September 15, 2025 12:48

Apply suggestion from @benwtrent

bc5d385

Co-authored-by: Benjamin Trent <[email protected]>

add 7 bit representation

30350e4

mark existing 99 formats as deprecated

93209dd

Merge branch 'sq-to-osq' of github.com:mccullocht/lucene into sq-to-osq

1b73a6f

fix some missing 7 bit checks

2de7d81

mccullocht commented Sep 15, 2025

View reviewed changes

.../core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsFormat.java Outdated Show resolved Hide resolved

mccullocht requested a review from benwtrent September 15, 2025 22:56

benwtrent approved these changes Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a new codec to implement OSQ for 4 and 8 bit quantized vectors #15169

Add a new codec to implement OSQ for 4 and 8 bit quantized vectors #15169

mccullocht commented Sep 9, 2025 •

edited

Loading

Uh oh!

benwtrent left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benwtrent commented Sep 13, 2025

Uh oh!

mccullocht left a comment

Uh oh!

Uh oh!

benwtrent commented Sep 15, 2025

Uh oh!

mccullocht commented Sep 15, 2025

Uh oh!

benwtrent left a comment

Uh oh!

Uh oh!

Add a new codec to implement OSQ for 4 and 8 bit quantized vectors #15169

Are you sure you want to change the base?

Add a new codec to implement OSQ for 4 and 8 bit quantized vectors #15169

Conversation

mccullocht commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benwtrent commented Sep 13, 2025

Uh oh!

mccullocht left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benwtrent commented Sep 15, 2025

Uh oh!

mccullocht commented Sep 15, 2025

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mccullocht commented Sep 9, 2025 •

edited

Loading