-
Notifications
You must be signed in to change notification settings - Fork 413
Reduce cardinality of metrics on event persister #19133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
5607852 to
861cb00
Compare
This reduces the size of metrics by ~80%. Responding with the metrics takes significant amounts of time.
861cb00 to
00dd2a4
Compare
| origin_entity = "*client*" | ||
| else: | ||
| origin_type = "remote" | ||
| origin_entity = get_domain_from_id(event.sender) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 We don't need to track the origin_entity server name of remote events (high cardinality).
| if context.app_service: | ||
| origin_type = "local" | ||
| origin_entity = context.app_service.id | ||
| elif self.hs.is_mine_id(event.sender): | ||
| origin_type = "local" | ||
| origin_entity = "*client*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
origin_entity still used in contrib/
origin_entity is still used in a few metrics (contrib/prometheus/synapse-v2.rules and contrib/grafana/synapse.json):
synapse_storage_events_persisted_by_source_typeused in theEvents/s Local vs Remotegraph.synapse_storage_events_persisted_by_event_typeused in theEvents/s by Typegraph
I understand those are contrib/ things that we don't necessarily maintain but do we at-least have a plan for how we're going to proceed with our own graphs (for example matrix.org)? Seems like adjusting our matrix.org graphs would overlap with fixing contrib/ anyway.
| if context.app_service: | ||
| origin_type = "local" | ||
| origin_entity = context.app_service.id | ||
| elif self.hs.is_mine_id(event.sender): | ||
| origin_type = "local" | ||
| origin_entity = "*client*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Losing information on application service (bridges)
From contrib/prometheus/synapse-v2.rules, to replace the functionality, it looks like we just need a label to distinguish local client vs bridge (appservice).
From the metrics in Grafana, it looks like we aren't running application services on matrix.org. But for some homeserver using application services, it could be nice to have the context.app_service.id available to distinguish where load is coming from. It's not necessarily high cardinality and not user controlled.
We could add a origin_app_service label which would be context.app_service.id or None.
| @@ -0,0 +1 @@ | |||
| Reduce cardinality of `synapse_storage_events_persisted_events_sep_total` metric by removing `origin_type` label. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
origin_type or origin_entity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops
This reduces the size of metrics by ~80%. Responding with the metrics takes significant amounts of time.