-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Implement percentiles agg for exponential_histogram #137720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| instance:keyword | p0:double | p50:double | p99:double | p100:double | ||
| instance-0 | 2.4E-4 | 0.0211404 | 1.0432946 | 6.786232 | ||
| instance-1 | 2.17E-4 | 6.469E-4 | 0.1422151 | 3.190723 | ||
| instance-2 | 2.2E-4 | 6.469E-4 | 0.0857672 | 2.7059714542564097 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: round p100 too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not necessary, because p0 is the min and p100 is the max. Those are exact across machines and merging
| case DOC -> DataType.DOC_DATA_TYPE; | ||
| case FLOAT, NULL, COMPOSITE, AGGREGATE_METRIC_DOUBLE, EXPONENTIAL_HISTOGRAM, UNKNOWN -> throw new EsqlIllegalArgumentException( | ||
| case EXPONENTIAL_HISTOGRAM -> DataType.EXPONENTIAL_HISTOGRAM; | ||
| case FLOAT, NULL, COMPOSITE, AGGREGATE_METRIC_DOUBLE, UNKNOWN -> throw new EsqlIllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix also AGGREGATE_METRIC_DOUBLE? We have a type for it, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, but we don't have any non-surrogate aggregations for AGGREGATE_METRIC_DOUBLE yet. So this would be effectively dead and untested code, which I would want to avoid adding right now. I think we should only add it when we actually add a non-surrogate aggregation for aggregate metric double.
kkrik-es
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, leaving it to Nhat to stamp.
|
|
||
| FROM exp_histo_sample | WHERE NOT STARTS_WITH(instance, "dummy") | ||
| | STATS p0 = PERCENTILE(responseTime,0), p50 = PERCENTILE(responseTime,50), p99 = PERCENTILE(responseTime, 99), p100 = PERCENTILE(responseTime,100) | ||
| | EVAL p50 = ROUND(p50, 7), p99 = ROUND(p99, 7) // rounding to avoid floating point precision issues, min and max are exact so no rounding needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had an interesting test failure where the percentiles where a tiny, tiny bit off the exact values on CI.
This also wouldn't reproduce locally with the same seed.
My guess here is that this is due to the different implementation of Math.log / Math.exp on my machine (Mac M1) vs the CI runners. See also this SO thread
For that reason I've added the rounding to make the tests more robust.
| // We should add a dedicated encoding when building a block from computed histograms which do not originate from doc values | ||
| // That encoding should be optimized for speed and support storing the zero threshold as (scale, index) pair | ||
| ZeroBucket zeroBucket = histogram.zeroBucket(); | ||
| assert zeroBucket.compareZeroThreshold(ZeroBucket.minimalEmpty()) == 0 || zeroBucket.isIndexBased() == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this assert, because this can now actually occur: We merge histograms, which means the zero-threshold of the result can be defined by a (scale, index) tuple instead of a double value. So when storing that merged histogram into the block, we have to "round" that to the nearest double, introducing a tiny bit of error.
This won't matter in practice because the error is tiny and we are only estimating percentiles anyway, but the errors could add up. I say this is okay for a tech preview, but the TODO above still stands that we should fix this latest before moving out of tech preview.
Ideally we can add some benchmarks first, so that we can also see if the encoding intended for disk-storage performed below is too costly (=it appears in the flame graphs) and we should do something special or not.
| instance:keyword | p0:double | p50:double | p99:double | p100:double | ||
| instance-0 | 2.4E-4 | 0.0211404 | 1.0432946 | 6.786232 | ||
| instance-1 | 2.17E-4 | 6.469E-4 | 0.1422151 | 3.190723 | ||
| instance-2 | 2.2E-4 | 6.469E-4 | 0.0857672 | 2.7059714542564097 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not necessary, because p0 is the min and p100 is the max. Those are exact across machines and merging
| case DOC -> DataType.DOC_DATA_TYPE; | ||
| case FLOAT, NULL, COMPOSITE, AGGREGATE_METRIC_DOUBLE, EXPONENTIAL_HISTOGRAM, UNKNOWN -> throw new EsqlIllegalArgumentException( | ||
| case EXPONENTIAL_HISTOGRAM -> DataType.EXPONENTIAL_HISTOGRAM; | ||
| case FLOAT, NULL, COMPOSITE, AGGREGATE_METRIC_DOUBLE, UNKNOWN -> throw new EsqlIllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, but we don't have any non-surrogate aggregations for AGGREGATE_METRIC_DOUBLE yet. So this would be effectively dead and untested code, which I would want to avoid adding right now. I think we should only add it when we actually add a non-surrogate aggregation for aggregate metric double.
Implements
PERCENTILEagg support forexponential_histogram(part of #137549):MERGEaggregation to merge exponential histograms. This aggregation is intentionally not registered and can only be used in surrogatesHISTOGRAM_PERCENTILE(MERGE(field), percentile)surrogate to implementPERCENTILES.This time I remembered to also run the tests locally in non-snapshot mode, so something like #137693 shouldn't happen again.
Looking at the logical plans for the CSV tests also shows the performance benefit of splitting the percentile extraction and the merge aggregation:
Even though we query four percentiles, we only do the merge aggregation once. This would not be the case if merge + percentile extraction were fused in a single aggregation.
@kkrik-es @not-napoleon : I intended to add you as optional reviewers. So feel free to ignore the request.