-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add relevant attributes to search took time APM metrics #134232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add relevant attributes to search took time APM metrics #134232
Conversation
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Hi @javanna, I've created a changelog YAML for you. |
server/src/test/java/org/elasticsearch/action/search/SearchRequestIntrospectorTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice improvement, thx Luca!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Luca.
Left a few minor comments
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/search/SearchRequestIntrospector.java
Outdated
Show resolved
Hide resolved
executeRequest( | ||
(SearchTask) task, | ||
searchRequest, | ||
new SearchResponseActionListener(listener, searchRequest), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about extending the lifecycle of the entire search request (these could be quite large given that they encapsulate the source builder).
The reason we're doing it here is so that the search response action listener can introspect the request when the search operation completes.
Would it be a good idea to introspect the request before we execute it and only pass along the metadata that results from the introspection?
We are also rewriting the search request as part of the search execution. The rewrite process might create a new instance of the search request. Is the introspection more suitable for the rewriten instance of the one we're passing on from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, great thoughts!
I planned on moving the search request introspection to an earlier place, and stop carrying the search request around in the listener. I did not do it straight-away out of laziness mostly.
We are interested in the search request on the coordinating node. There though, the indices get resolved and replaced, and we are interested in the resolved indices as opposed to the potential wildcard expressions or alias names. This is the reason why I initially carried the search request to the listener. This way, we can just delay its resolution and we are sure that it will hold resolved indices. Not doing so makes it messier because the listener is where we record the metric and we need the attributes, yet resolving attributes at listener creation is too early (we need to have indices resolution happen first). I need to find a way to make this work, but I do think it's a good investment.
Instead, when it comes to query rewrite, we want to introspect the search request before its rewrite, otherwise for instance the date range filter may get rewritten to match all or match none. Looking at the design of the rewrite framework, a new instance is returned whenever anything gets rewritten to something different compared to the object rewrite was called against. That means that no state changes in the original object, which is what we are looking for. It would get complicated otherwise to see in what cases we may see changes, depending on where shards are (is can match running on the coord node or on remote nodes?).
All in all, we need to worry about getting the replaced indices, the rest is rather simple to get from the original search request.
Thanks for making me think harder about this. I also added tests around this stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#134625 should help with getting resolved indices. I will add more tests though around indices resolution when pointing to a data stream, as well as with security enabled. Let me know if there's more scenarios that come to mind for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that testing with security enabled is very tricky (due to the need to place the test under security as well as having to introspect the recorded metrics via a test plugin. I think though that with the latest update, that's no longer required. I had to grab the indices from outside of the search request, from the resolved indices abstraction, which feels much better and feels safer with or without security enabled.
I did add a few more tests cause I had some coverage gaps around dotted indices, and wildcard as well as alias resolution.
import org.elasticsearch.search.sort.ScriptSortBuilder; | ||
import org.elasticsearch.test.ESTestCase; | ||
|
||
public class SearchRequestIntrospectorTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice test suite ❤️
31808ac
to
98d5a5a
Compare
) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in #134232 .
@andreidan this should be ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this Luca
This looks very good ! 🚀
We already record latency of the query and fetch phase as APM metrics. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a shard search request and stores the extracted attributes together with the shard phase latency metrics. This builds on top of elastic#134232 to use the same infra and store the same attributes for shard level latency metrics.
…tic#134625) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.
…tic#134625) TransportSearchAction collects telemetry around CCS features used as well as took time of a search request. It does that via a action listener wrapping. The open PIT API uses TransportSearchAction for its execution, yet it does not wrap the listener in the same way, as search telemetry should not be affected by it. As a result, we have a TelemetryListener interface, with some instanceof checks in the code and a single implementation of such interface that is the action listener wrapper itself, which allows to record the telemetry on response. This can be simplified by moving the listener wrapping to the TransportSearchAction code, and specialize a little the open PIT path so that telemetry can still be disabled for it. Along with the simplifications, this change allows us to delay the wrapping of the listener, which will allow us to grab the resolved indices in elastic#134232 .
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request: - does it have agg or hit only? - is it sorted by field or by score? - does it have a time range filter? - does it target user data or internal indices? This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.
We already record the took time of a search request via took metric. We'd like to be able to slice such latencies based on some recurring categories of the request:
This commit introduces introspection for a search request and stores the extracted attributes together with the search took time metric.