Skip to content

Conversation

javanna
Copy link
Member

@javanna javanna commented Sep 5, 2025

We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request:

  • does it have agg or hit only?
  • is it sorted by field or by score?
  • does it have a time range filter?
  • does it target user data or internal indices?

This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.

We already record the took time of a search request via took metric.
We'd like to be able to slice such latencies based on some recurring categories
of the request:
- does it have agg or hit only?
- is it sorted by field or by score?
- does it have a time range filter?
- does it target user data or internal indices?

This commit introduces introspection for a search request and stores the
extracted attributes together with the search took time metric.
@javanna javanna added >enhancement :Search Foundations/Search Catch all for Search Foundations v9.2.0 labels Sep 5, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Sep 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement, thx Luca!

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Luca.

Left a few minor comments

executeRequest(
(SearchTask) task,
searchRequest,
new SearchResponseActionListener(listener, searchRequest),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about extending the lifecycle of the entire search request (these could be quite large given that they encapsulate the source builder).
The reason we're doing it here is so that the search response action listener can introspect the request when the search operation completes.

Would it be a good idea to introspect the request before we execute it and only pass along the metadata that results from the introspection?
We are also rewriting the search request as part of the search execution. The rewrite process might create a new instance of the search request. Is the introspection more suitable for the rewriten instance of the one we're passing on from here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great thoughts!

I planned on moving the search request introspection to an earlier place, and stop carrying the search request around in the listener. I did not do it straight-away out of laziness mostly.

We are interested in the search request on the coordinating node. There though, the indices get resolved and replaced, and we are interested in the resolved indices as opposed to the potential wildcard expressions or alias names. This is the reason why I initially carried the search request to the listener. This way, we can just delay its resolution and we are sure that it will hold resolved indices. Not doing so makes it messier because the listener is where we record the metric and we need the attributes, yet resolving attributes at listener creation is too early (we need to have indices resolution happen first). I need to find a way to make this work, but I do think it's a good investment.

Instead, when it comes to query rewrite, we want to introspect the search request before its rewrite, otherwise for instance the date range filter may get rewritten to match all or match none. Looking at the design of the rewrite framework, a new instance is returned whenever anything gets rewritten to something different compared to the object rewrite was called against. That means that no state changes in the original object, which is what we are looking for. It would get complicated otherwise to see in what cases we may see changes, depending on where shards are (is can match running on the coord node or on remote nodes?).

All in all, we need to worry about getting the replaced indices, the rest is rather simple to get from the original search request.

Thanks for making me think harder about this. I also added tests around this stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#134625 should help with getting resolved indices. I will add more tests though around indices resolution when pointing to a data stream, as well as with security enabled. Let me know if there's more scenarios that come to mind for testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that testing with security enabled is very tricky (due to the need to place the test under security as well as having to introspect the recorded metrics via a test plugin. I think though that with the latest update, that's no longer required. I had to grab the indices from outside of the search request, from the resolved indices abstraction, which feels much better and feels safer with or without security enabled.

I did add a few more tests cause I had some coverage gaps around dotted indices, and wildcard as well as alias resolution.

import org.elasticsearch.search.sort.ScriptSortBuilder;
import org.elasticsearch.test.ESTestCase;

public class SearchRequestIntrospectorTests extends ESTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice test suite ❤️

@javanna javanna force-pushed the enhancement/introspect_search_took_time branch from 31808ac to 98d5a5a Compare September 12, 2025 15:56
javanna added a commit that referenced this pull request Sep 15, 2025
)

TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response.

This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in #134232 .
@javanna
Copy link
Member Author

javanna commented Sep 15, 2025

@andreidan this should be ready for another look.

@andreidan andreidan self-requested a review September 16, 2025 09:53
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Luca

This looks very good ! 🚀

@javanna javanna merged commit 136321f into elastic:main Sep 16, 2025
34 checks passed
@javanna javanna deleted the enhancement/introspect_search_took_time branch September 16, 2025 11:04
javanna added a commit to javanna/elasticsearch that referenced this pull request Sep 16, 2025
We already record latency of the query and fetch phase as APM metrics.
We'd like to be able to slice such latencies based on some recurring
categories of the request:

- does it have agg or hit only?
- is it sorted by field or by score?
- does it have a time range filter?
- does it target user data or internal indices?

This commit introduces introspection for a shard search request and stores the
extracted attributes together with the shard phase latency metrics.

This builds on top of elastic#134232 to use the same infra and store the same attributes
for shard level latency metrics.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Sep 17, 2025
…tic#134625)

TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response.

This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Sep 17, 2025
We already record the took time of a search request via took metric.
We'd like to be able to slice such latencies based on some recurring categories
of the request:
- does it have agg or hit only?
- is it sorted by field or by score?
- does it have a time range filter?
- does it target user data or internal indices?

This commit introduces introspection for a search request and stores the
extracted attributes together with the search took time metric.
gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 18, 2025
…tic#134625)

TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response.

This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .
gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 18, 2025
We already record the took time of a search request via took metric.
We'd like to be able to slice such latencies based on some recurring categories
of the request:
- does it have agg or hit only?
- is it sorted by field or by score?
- does it have a time range filter?
- does it target user data or internal indices?

This commit introduces introspection for a search request and stores the
extracted attributes together with the search took time metric.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants