Skip to content

Commit 86b9c08

Browse files
committed
Add p-value heuristics to significant terms aggregation (#5353)
(cherry picked from commit 2cddbef)
1 parent de8c5c7 commit 86b9c08

File tree

5 files changed

+137
-17
lines changed

5 files changed

+137
-17
lines changed

output/openapi/elasticsearch-openapi.json

Lines changed: 21 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/openapi/elasticsearch-serverless-openapi.json

Lines changed: 21 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/schema/schema.json

Lines changed: 63 additions & 17 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/typescript/types.ts

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

specification/_types/aggregations/bucket.ts

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -814,6 +814,22 @@ export class ScriptedHeuristic {
814814
script: Script
815815
}
816816

817+
export class PValueHeuristic {
818+
/*
819+
* Set to false to indicate that the background set does
820+
* not contain the counts of the foreground set as they are filtered out.
821+
* @server_default true
822+
*/
823+
background_is_superset?: boolean
824+
/**
825+
* Should the results be normalized when above the given value.
826+
* Allows for consistent significance results at various scales.
827+
* Note: `0` is a special value which means no normalization
828+
* @server_default 0
829+
*/
830+
normalize_above?: long
831+
}
832+
817833
/**
818834
* @ext_doc_id search-aggregations-bucket-significanttext-aggregation
819835
*/
@@ -867,6 +883,16 @@ export class SignificantTermsAggregation extends BucketAggregationBase {
867883
* Customized score, implemented via a script.
868884
*/
869885
script_heuristic?: ScriptedHeuristic
886+
/**
887+
* Significant terms heuristic that calculates the p-value between the term existing in foreground and background sets.
888+
*
889+
* The p-value is the probability of obtaining test results at least as extreme as
890+
* the results actually observed, under the assumption that the null hypothesis is
891+
* correct. The p-value is calculated assuming that the foreground set and the
892+
* background set are independent https://en.wikipedia.org/wiki/Bernoulli_trial, with the null
893+
* hypothesis that the probabilities are the same.
894+
*/
895+
p_value?: PValueHeuristic
870896
/**
871897
* Regulates the certainty a shard has if the term should actually be added to the candidate list or not with respect to the `min_doc_count`.
872898
* Terms will only be considered if their local shard frequency within the set is higher than the `shard_min_doc_count`.

0 commit comments

Comments
 (0)