Skip to content

Conversation

spilchen
Copy link
Contributor

@spilchen spilchen commented Oct 9, 2025

Add a new roachtest to measure the impact of admission control on INSPECT operations running concurrently with foreground workloads.

As part of this, introduce a cluster setting knob to control admission control behavior for INSPECT operations, allowing them to be run with either BulkLowQoS (enabled) or Normal QoS (disabled).

The test performs three INSPECT runs:

  1. A calibration run (with no workload) to dynamically determine test duration.
  2. An INSPECT run with admission control enabled (BulkLowQoS) during a foreground read workload.
  3. An INSPECT run with admission control disabled (Normal QoS) during the same workload.

Each run measures and reports throughput (rows/s/CPU), enabling us to quantify:

  • The performance impact of admission control on INSPECT operations.
  • The effectiveness of admission control in protecting foreground workload latency.

The workload duration is dynamically calculated to ensure it's long enough to encompass both INSPECT runs.

Informs: #154457
Epic: CRDB-30356

Release note: none

@spilchen spilchen self-assigned this Oct 9, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@spilchen spilchen force-pushed the gh-154457/251009/0819/inspect-overload-workload/pr-ready branch 2 times, most recently from 85566a6 to b4325f0 Compare October 9, 2025 17:24
@spilchen
Copy link
Contributor Author

spilchen commented Oct 9, 2025

@spilchen spilchen requested a review from rafiss October 9, 2025 17:37
@spilchen spilchen marked this pull request as ready for review October 9, 2025 17:37
@spilchen spilchen requested a review from a team as a code owner October 9, 2025 17:37
@spilchen spilchen force-pushed the gh-154457/251009/0819/inspect-overload-workload/pr-ready branch 2 times, most recently from 0f0344e to 9e57996 Compare October 10, 2025 21:17
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice tests! i just had some small questions

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spilchen)


pkg/cmd/roachtest/tests/admission_control_inspect.go line 45 at r1 (raw file):

	return registry.TestSpec{
		Name:             fmt.Sprintf("inspect/admission-control/rows=%d", numRows),

nit: it may be useful to include the cpu count and node count in the name. for example, see the naming convention of other test names at https://roachperf.crdb.dev/ (e.g. cdc/scan/catchup-cold/nodes=5/cpu=16/rows=1G/ranges=100/protocol=mux/format=json/sink=null). it's helpful since we also need code in roachperf to display these charts, and the charting logic is based on the test name.


pkg/cmd/roachtest/tests/admission_control_inspect.go line 232 at r2 (raw file):

			t.L().Printf("Setting workload duration to %v (based on calibration: %v)\n", workloadDuration, calibrationDuration)

			// Start read-only foreground workload using kv (since bulkingest doesn't support reads).

is there a specific reason that we want a read only workload here? what would the issue be with having a workload with writes? perhaps the answer could be included in this code comment.


pkg/cmd/roachtest/tests/admission_control_inspect.go line 248 at r2 (raw file):

			time.Sleep(baselineDuration)

			_ = runInspect(

nit: for these calls to runInspect, it could be useful to keep the return value and log the duration. seeing the durtaion in logs might be useful when manually looking at logs.

Add a new roachtest to measure the impact of admission control on
INSPECT operations running concurrently with foreground workloads.

As part of this, introduce a cluster setting knob to control admission
control behavior for INSPECT operations, allowing them to be run with
either BulkLowQoS (enabled) or Normal QoS (disabled).

The test performs three INSPECT runs:
1. A calibration run (with no workload) to dynamically determine test duration.
2. An INSPECT run with admission control enabled (BulkLowQoS) during a
   foreground read workload.
3. An INSPECT run with admission control disabled (Normal QoS) during the
   same workload.

Each run measures and reports throughput (rows/s/CPU), enabling us to
quantify:
- The performance impact of admission control on INSPECT operations.
- The effectiveness of admission control in protecting foreground workload
  latency.

The workload duration is dynamically calculated to ensure it's long enough
to encompass both INSPECT runs.

Informs: cockroachdb#154457
Epic: CRDB-30356

Release note: none
@spilchen spilchen force-pushed the gh-154457/251009/0819/inspect-overload-workload/pr-ready branch from 9e57996 to 6aa1028 Compare October 14, 2025 12:15
Copy link
Contributor Author

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/cmd/roachtest/tests/admission_control_inspect.go line 45 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: it may be useful to include the cpu count and node count in the name. for example, see the naming convention of other test names at https://roachperf.crdb.dev/ (e.g. cdc/scan/catchup-cold/nodes=5/cpu=16/rows=1G/ranges=100/protocol=mux/format=json/sink=null). it's helpful since we also need code in roachperf to display these charts, and the charting logic is based on the test name.

I'll use this naming convention for the other inspect roachtest too.


pkg/cmd/roachtest/tests/admission_control_inspect.go line 232 at r2 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is there a specific reason that we want a read only workload here? what would the issue be with having a workload with writes? perhaps the answer could be included in this code comment.

INSPECT is read-only, so I would expect contention with another workload to be only on reads too. Also, I wanted a static dataset since we run INSPECT multiple times. I wanted it doing the same amount of work each time, just under different load characteristics.


pkg/cmd/roachtest/tests/admission_control_inspect.go line 248 at r2 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

nit: for these calls to runInspect, it could be useful to keep the return value and log the duration. seeing the durtaion in logs might be useful when manually looking at logs.

The runInspect captures that with this log message:
https://github.com/cockroachdb/cockroach/pull/155144/files#diff-094b401c27f4a7015bb40ca9c4c9f24b3d9708948251fc5a5b796e5c10d061b6R175-R179

@spilchen
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 14, 2025

@craig craig bot merged commit c81963c into cockroachdb:master Oct 14, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants