-
Notifications
You must be signed in to change notification settings - Fork 482
Allow join yielding by work and time simultaneously #22610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8b1d2e7 to
0174bf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left some comments inline.
| (Materialize, Some(work_limit), Some(time_limit)) => { | ||
| let yield_fn = | ||
| move |start: Instant, work| work >= work_limit || start.elapsed() >= time_limit; | ||
| mz_join_core(arranged1, arranged2, shutdown_token, result, yield_fn) | ||
| } | ||
| (Materialize, Some(work_limit), None) => { | ||
| let yield_fn = move |_start, work| work >= work_limit; | ||
| mz_join_core(arranged1, arranged2, shutdown_token, result, yield_fn) | ||
| } | ||
| (Materialize, None, Some(time_limit)) => { | ||
| let yield_fn = move |start: Instant, _work| start.elapsed() >= time_limit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the overhead is to checking the option inside the closure. Calling mz_join_core several times might be bad for compile times, but checking the limits in the closure might consume more CPU? Unclear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are several things we could do:
- Have a single
yield_fnthat always checks both work and time and handles theOptions. Good for readability, good for compile times, bad for runtime performance. - Have multiple
yield_fns for eachSome/Nonecombination, instantiate a differentmz_join_corevariant for each of them. Bad for readability, bad for compile times, good for runtime performance. - Have multiple
yield_fns for eachSome/Nonecombination, instantiate a singlemz_join_corethat takes aBox<dyn YFn>. Okay for readability, good for compile times, okay for runtime performance.
(The "X for runtime performance" are just guesses, I don't really know either.)
I went with (2) here because I really want to avoid regressing join performance, but I'm not too happy about readability. I'm not too worried about compile times, though maybe I should be :)
This commit extends the `YieldSpec` type and the syntax allowed for the `linear_join_yielding` system var to enable specifying yield strategies that consider both the performed work and the elapsed time. This will allow us to make sure that (a) join operators don't keep around huge amounts of output records and (b) join operators don't regress interactivity.
0174bf3 to
67c9dcb
Compare
|
TFTR! |
This PR extends the
YieldSpectype and the syntax allowed for thelinear_join_yieldingsystem var to enable specifying yield strategies that consider both the performed work and the elapsed time.This will allow us to make sure that (a) join operators don't keep around huge amounts of output records and (b) join operators don't regress interactivity.
Motivation
Part of MaterializeInc/database-issues#6761
Follow-up to #22391, in which we discussed that we still want a way to ensure join outputs won't produce an OOM when time-based yielding is used.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.