Skip to content

Conversation

@jshook
Copy link
Contributor

@jshook jshook commented Oct 10, 2025

This extends the examples/bench capabilities in jvector:

The DataSet type is virtualized behind an interface
The way DataSets are loaded is more modular
A DataSet loader has been added to support the vectordata API
The net effect is that bench can no access vectordata hosted datasets.
The benefits of this are several:

remote vector test data hosting
uniform API for finding and using vector datasets
merkle-based automatic download of chunks for dynamic access
efficient download and automatic caching of data locally on test nodes
mapping and ranging subsets of data, such as "first 1M", "first 10M" and so on under profile names
packing various vector data views in a consistent API: base, query, ...
management of datasets via catalogs, orthogonal to access control
Most of the core wiring of these capabilities is provided by another library which is part of the nosqlbench project. The changes to jvector are to adapt it to use these APIs.

There are a couple issues yet to resolve with the Java version configs and GHA.

@github-actions
Copy link
Contributor

Before you submit for review:

  • Does your PR follow guidelines from CONTRIBUTIONS.md?
  • Did you summarize what this PR does clearly and concisely?
  • Did you include performance data for changes which may be performance impacting?
  • Did you include useful docs for any user-facing changes or features?
  • Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
  • Did you trigger and review regression testing results against the base branch via Run Bench Main?
  • Did you adhere to the code formatting guidelines (TBD)
  • Did you group your changes for easy review, providing meaningful descriptions for each commit?
  • Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant