Skip to content

Conversation

zhaih
Copy link
Contributor

@zhaih zhaih commented Sep 15, 2025

Description

As title, add more comments about concurrency, and also removes
addGraphNode(int node, UpdateableRandomVectorScorer scorer) and ensure all addGraphNode(int node) call will not recreate a new scorer.

Test

Did run benchmark on 100d vectors, didn't find any regression

candidate runs

single thread

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.804        0.230   0.225        0.978  500000   100      50       64        250         no     14.07      35549.24           57.87             1          216.95       190.735      190.735       HNSW

multi thread

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.835        0.244   0.239        0.980  500000   100      50       64        250         no     14.05      35584.66           18.36             1          220.28       190.735      190.735       HNSW

baseline runs

single thread

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.805        0.230   0.226        0.983  500000   100      50       64        250         no     13.96      35808.92           57.34             1          216.93       190.735      190.735       HNSW

multi thread

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.834        0.252   0.248        0.984  500000   100      50       64        250         no     14.32      34916.20           18.31             1          220.28       190.735      190.735       HNSW

Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@zhaih zhaih added this to the 10.4.0 milestone Sep 15, 2025
Copy link
Contributor

@msokolov msokolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you for adding all the comments explaining about the tricky concurrent algorithm and how it maintains a consistent graph that is searchable by other threads while being mutated! I guess this is a precursor to melding the concurrent builder and the merging builder with its ability to re-use via seeded search? Apache CoC FTW!

this.scorerSupplier =
Objects.requireNonNull(scorerSupplier, "scorer supplier must not be null");
this.scorer =
Objects.requireNonNull(scorerSupplier, "scorer supplier must not be null").scorer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change the error message to say "scorer must not be null"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually still asserting on scorerSupplier (which is passed in to ctor)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants