Skip to content

Conversation

michaelsembwever
Copy link
Member

https://github.com/riptano/cndb/issues/15483

Port into main-5.0 commit 6383d21

CNDB-15483: CNDB-15300: Add `SSTableReader#getApproximatePositionsForRanges` (#1993)

This PR is in the context of https://github.com/riptano/cndb/pull/15380, and is used by its PR https://github.com/riptano/cndb/pull/15380.

It adds a variant of the `SSTableReader#getPositionsForRanges` method that never read the data file to return its results, but in exchange may return positions that slightly "overshoot" the requested range.

Put another way, the added method
`SSTableReader#getApproximatePositionsForRanges` is such that if you call it on some range `R`, and you read the data within the returned positions, then the read data may start by one (at most) key (partition really) that sorts strictly before `R`, and may end by one (at most) key that sorts strictly after `R`.

Additionally, the PR switches the reading of the `Statistics.db` component from using `RandomAccessReader` to using `FileInputStreamPlus`. This is essentially equivalent functionality wise (since the component is deserialized sequentially anyway, there is no random reads), but by making it more "clear" that it doesn't do random reads, it allows us to "direct download" this component like other related components on the CNDB side. See the last point of https://github.com/riptano/cndb/pull/15380 for more details.

Copy link

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

@djatnieks djatnieks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported changes look good - left comment about maybe extraneous change to xml file.

…Ranges` (#1993)

This PR is in the context of riptano/cndb#15380,
and is used by its PR riptano/cndb#15380.

It adds a variant of the `SSTableReader#getPositionsForRanges` method
that never read the data file to return its results, but in exchange may
return positions that slightly "overshoot" the requested range.

Put another way, the added method
`SSTableReader#getApproximatePositionsForRanges` is such that if you
call it on some range `R`, and you read the data within the returned
positions, then the read data may start by one (at most) key (partition
really) that sorts strictly before `R`, and may end by one (at most) key
that sorts strictly after `R`.

Additionally, the PR switches the reading of the `Statistics.db`
component from using `RandomAccessReader` to using
`FileInputStreamPlus`. This is essentially equivalent functionality wise
(since the component is deserialized sequentially anyway, there is no
random reads), but by making it more "clear" that it doesn't do random
reads, it allows us to "direct download" this component like other
related components on the CNDB side. See the last point of
riptano/cndb#15380 for more details.
@michaelsembwever
Copy link
Member Author

AggregationQueriesTest and SecondaryIndexManagerTest failures are on main-5.0 too (not related), and the others i cannot reproduce (not related).

@michaelsembwever michaelsembwever merged commit 6c33d01 into main-5.0 Oct 14, 2025
179 of 515 checks passed
@michaelsembwever michaelsembwever deleted the mck-cndb-15483-main-5.0 branch October 14, 2025 10:01
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2021 rejected by Butler


4 regressions found
See build details here


Found 4 new test failures

Test Explanation Runs Upstream
o.a.c.cql3.validation.entities.SecondaryIndexTest.testCreateAndDropIndexWithQuotedIdentifier (compression) REGRESSION 🔴🔵 0 / 11
o.a.c.cql3.validation.operations.AggregationQueriesTest.testAggregationQueryShouldNotTimeoutWhenItExceedesReadTimeout (compression) REGRESSION 🔴🔴 2 / 11
o.a.c.distributed.test.repair.ForceRepairTest.force () NEW 🔴 0 / 11
o.a.c.distributed.test.sai.VectorDistributedTest.testBasicGeoDistance REGRESSION 🔴🔵 0 / 11

Found 7 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants