Add relevant attributes to search took time APM metrics #134232

javanna · 2025-09-05T15:45:33Z

We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request:

does it have agg or hit only?
is it sorted by field or by score?
does it have a time range filter?
does it target user data or internal indices?

This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.

We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.

elasticsearchmachine · 2025-09-05T15:45:58Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine · 2025-09-05T15:45:58Z

Hi @javanna, I've created a changelog YAML for you.

server/src/test/java/org/elasticsearch/action/search/SearchRequestIntrospectorTests.java

server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java

tteofili

nice improvement, thx Luca!

andreidan

Thanks for working on this Luca.

Left a few minor comments

server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java

andreidan · 2025-09-09T10:31:01Z

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

+        executeRequest(
+            (SearchTask) task,
+            searchRequest,
+            new SearchResponseActionListener(listener, searchRequest),


I'm a bit concerned about extending the lifecycle of the entire search request (these could be quite large given that they encapsulate the source builder).
The reason we're doing it here is so that the search response action listener can introspect the request when the search operation completes.

Would it be a good idea to introspect the request before we execute it and only pass along the metadata that results from the introspection?
We are also rewriting the search request as part of the search execution. The rewrite process might create a new instance of the search request. Is the introspection more suitable for the rewriten instance of the one we're passing on from here?

Thanks, great thoughts!

I planned on moving the search request introspection to an earlier place, and stop carrying the search request around in the listener. I did not do it straight-away out of laziness mostly.

We are interested in the search request on the coordinating node. There though, the indices get resolved and replaced, and we are interested in the resolved indices as opposed to the potential wildcard expressions or alias names. This is the reason why I initially carried the search request to the listener. This way, we can just delay its resolution and we are sure that it will hold resolved indices. Not doing so makes it messier because the listener is where we record the metric and we need the attributes, yet resolving attributes at listener creation is too early (we need to have indices resolution happen first). I need to find a way to make this work, but I do think it's a good investment.

Instead, when it comes to query rewrite, we want to introspect the search request before its rewrite, otherwise for instance the date range filter may get rewritten to match all or match none. Looking at the design of the rewrite framework, a new instance is returned whenever anything gets rewritten to something different compared to the object rewrite was called against. That means that no state changes in the original object, which is what we are looking for. It would get complicated otherwise to see in what cases we may see changes, depending on where shards are (is can match running on the coord node or on remote nodes?).

All in all, we need to worry about getting the replaced indices, the rest is rather simple to get from the original search request.

Thanks for making me think harder about this. I also added tests around this stuff.

#134625 should help with getting resolved indices. I will add more tests though around indices resolution when pointing to a data stream, as well as with security enabled. Let me know if there's more scenarios that come to mind for testing.

Turns out that testing with security enabled is very tricky (due to the need to place the test under security as well as having to introspect the recorded metrics via a test plugin. I think though that with the latest update, that's no longer required. I had to grab the indices from outside of the search request, from the resolved indices abstraction, which feels much better and feels safer with or without security enabled.

I did add a few more tests cause I had some coverage gaps around dotted indices, and wildcard as well as alias resolution.

andreidan · 2025-09-09T10:34:46Z

server/src/test/java/org/elasticsearch/action/search/SearchRequestIntrospectorTests.java

+import org.elasticsearch.search.sort.ScriptSortBuilder;
+import org.elasticsearch.test.ESTestCase;
+
+public class SearchRequestIntrospectorTests extends ESTestCase {


Very nice test suite ❤️

) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in #134232 .

javanna · 2025-09-15T19:14:18Z

@andreidan this should be ready for another look.

andreidan

Thanks for iterating on this Luca

This looks very good ! 🚀

We already record latency of the query and fetch phase as APM metrics. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a shard search request and stores the extracted attributes together with the shard phase latency metrics. This builds on top of elastic#134232 to use the same infra and store the same attributes for shard level latency metrics.

…tic#134625) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .

We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.

…tic#134625) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .

We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.

javanna requested review from tteofili and andreidan September 5, 2025 15:45

javanna added >enhancement :Search Foundations/Search Catch all for Search Foundations v9.2.0 labels Sep 5, 2025

elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Sep 5, 2025

Update docs/changelog/134232.yaml

91c763f

javanna commented Sep 5, 2025

View reviewed changes

server/src/test/java/org/elasticsearch/action/search/SearchRequestIntrospectorTests.java Outdated Show resolved Hide resolved

javanna commented Sep 5, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java Outdated Show resolved Hide resolved

tteofili approved these changes Sep 9, 2025

View reviewed changes

andreidan reviewed Sep 9, 2025

View reviewed changes

javanna added 4 commits September 11, 2025 14:25

Merge branch 'main' into enhancement/introspect_search_took_time

5c9999a

iter

3c54484

iter

07838b2

Merge branch 'main' into enhancement/introspect_search_took_time

bc009b3

javanna mentioned this pull request Sep 12, 2025

Centralize telemetry listener wrapping in TransportSearchAction #134625

Merged

iter

98d5a5a

javanna force-pushed the enhancement/introspect_search_took_time branch from 31808ac to 98d5a5a Compare September 12, 2025 15:56

javanna and others added 5 commits September 12, 2025 17:57

Merge branch 'main' into enhancement/introspect_search_took_time

6f1824b

iter

c0271a3

[CI] Auto commit changes from spotless

814b210

wip

53dcc9f

Merge branch 'main' into HEAD

1b3bf35

javanna added 3 commits September 15, 2025 19:46

Merge branch 'main' into enhancement/introspect_search_took_time

c94a491

iter

83378b9

iter

44ffe87

Merge branch 'main' into enhancement/introspect_search_took_time

b61e0ad

andreidan self-requested a review September 16, 2025 09:53

andreidan approved these changes Sep 16, 2025

View reviewed changes

Merge branch 'main' into enhancement/introspect_search_took_time

a7f04f2

javanna merged commit 136321f into elastic:main Sep 16, 2025
34 checks passed

javanna deleted the enhancement/introspect_search_took_time branch September 16, 2025 11:04

javanna mentioned this pull request Sep 16, 2025

Add relevant attributes to shard search latency APM metrics #134798

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add relevant attributes to search took time APM metrics #134232

Add relevant attributes to search took time APM metrics #134232

Uh oh!

javanna commented Sep 5, 2025

Uh oh!

elasticsearchmachine commented Sep 5, 2025

Uh oh!

elasticsearchmachine commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

tteofili left a comment

Uh oh!

andreidan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreidan Sep 9, 2025

Uh oh!

javanna Sep 12, 2025

Uh oh!

javanna Sep 12, 2025

Uh oh!

javanna Sep 15, 2025

Uh oh!

andreidan Sep 9, 2025

Uh oh!

javanna commented Sep 15, 2025

Uh oh!

andreidan left a comment

Uh oh!

Uh oh!

Uh oh!

Add relevant attributes to search took time APM metrics #134232

Add relevant attributes to search took time APM metrics #134232

Uh oh!

Conversation

javanna commented Sep 5, 2025

Uh oh!

elasticsearchmachine commented Sep 5, 2025

Uh oh!

elasticsearchmachine commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

tteofili left a comment

Choose a reason for hiding this comment

Uh oh!

andreidan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreidan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

javanna Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

javanna Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

javanna Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

andreidan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

javanna commented Sep 15, 2025

Uh oh!

andreidan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!