Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Oct 24, 2023

This PR extends the YieldSpec type and the syntax allowed for the linear_join_yielding system var to enable specifying yield strategies that consider both the performed work and the elapsed time.

This will allow us to make sure that (a) join operators don't keep around huge amounts of output records and (b) join operators don't regress interactivity.

Motivation

  • This PR fixes a recognized bug.

Part of MaterializeInc/database-issues#6761

Follow-up to #22391, in which we discussed that we still want a way to ensure join outputs won't produce an OOM when time-based yielding is used.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • N/A

@teskje teskje force-pushed the linear-join-yielding-2 branch 2 times, most recently from 8b1d2e7 to 0174bf3 Compare October 24, 2023 11:48
@teskje teskje marked this pull request as ready for review October 24, 2023 13:51
@teskje teskje requested a review from a team October 24, 2023 13:51
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left some comments inline.

Comment on lines +103 to +113
(Materialize, Some(work_limit), Some(time_limit)) => {
let yield_fn =
move |start: Instant, work| work >= work_limit || start.elapsed() >= time_limit;
mz_join_core(arranged1, arranged2, shutdown_token, result, yield_fn)
}
(Materialize, Some(work_limit), None) => {
let yield_fn = move |_start, work| work >= work_limit;
mz_join_core(arranged1, arranged2, shutdown_token, result, yield_fn)
}
(Materialize, None, Some(time_limit)) => {
let yield_fn = move |start: Instant, _work| start.elapsed() >= time_limit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the overhead is to checking the option inside the closure. Calling mz_join_core several times might be bad for compile times, but checking the limits in the closure might consume more CPU? Unclear!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are several things we could do:

  1. Have a single yield_fn that always checks both work and time and handles the Options. Good for readability, good for compile times, bad for runtime performance.
  2. Have multiple yield_fns for each Some/None combination, instantiate a different mz_join_core variant for each of them. Bad for readability, bad for compile times, good for runtime performance.
  3. Have multiple yield_fns for each Some/None combination, instantiate a single mz_join_core that takes a Box<dyn YFn>. Okay for readability, good for compile times, okay for runtime performance.

(The "X for runtime performance" are just guesses, I don't really know either.)

I went with (2) here because I really want to avoid regressing join performance, but I'm not too happy about readability. I'm not too worried about compile times, though maybe I should be :)

This commit extends the `YieldSpec` type and the syntax allowed for the
`linear_join_yielding` system var to enable specifying yield strategies
that consider both the performed work and the elapsed time.

This will allow us to make sure that (a) join operators don't keep
around huge amounts of output records and (b) join operators don't
regress interactivity.
@teskje teskje force-pushed the linear-join-yielding-2 branch from 0174bf3 to 67c9dcb Compare October 24, 2023 15:27
@teskje
Copy link
Contributor Author

teskje commented Oct 25, 2023

TFTR!

@teskje teskje merged commit 33f3cd2 into MaterializeInc:main Oct 25, 2023
@teskje teskje deleted the linear-join-yielding-2 branch October 25, 2023 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants