-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PG: Add metrics for pg_stat_replication's sent/write/flush/replay #21844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
integrations-core/postgres/datadog_checks/postgres/postgres.py
Lines 393 to 394 in d26b527
| queries.append(QUERY_PG_REPLICATION_SLOTS) | |
| queries.append(QUERY_PG_REPLICATION_STATS_METRICS) |
The new QUERY_PG_REPLICATION_STATS_METRICS query now calls pg_current_wal_lsn() for every row, but it is still appended unconditionally for all Postgres ≥10 environments. On Aurora instances where wal_level is not set to logical, calling pg_current_wal_lsn() raises an error (this is why the control checkpoint metrics are skipped under the same condition a few lines above). Without a similar guard here, the check will start failing on default Aurora setups. Consider skipping this query when self.is_aurora and self.wal_level != 'logical' or using a function that is allowed on Aurora.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
d26b527 to
8dea5ff
Compare
|
|
pg_stat_replication provides metrics on the last sent/write/flush/replay WAL location by a standby server. sent delay doesn't depend on a feedback message from the standby since it tracks the sent WAL through the connection. This can be used to gauge how fast and how late a standby is when catching up
8dea5ff to
1ece735
Compare
Only run checkpoint and check metrics on the primary. This will remove the possible uncertainty of having the Checkpoint record being correctly propagated to the standby before standby's checkpoint is triggered.
Add the timestamp to the application name to reduce the risk of leftover query and connection being present when metrics are collected.
45beba7 to
c6437d1
Compare
What does this PR do?
Add sent/write/flush/replay lsn delay metrics from pg_stat_replication.
Motivation
pg_stat_replication provides metrics on the last sent/write/flush/replay WAL location by a standby server.
sent delay doesn't depend on a feedback message from the standby since it tracks the sent WAL through the connection. This can be used to gauge how fast and how late a standby is when catching up
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged