From 985757594626f52431b8294e0cb05288fe616344 Mon Sep 17 00:00:00 2001 From: Phil Eaton Date: Tue, 19 Aug 2025 19:56:00 -0400 Subject: [PATCH 1/7] Dump restore guidance --- .../docs/pgd/6/reference/backup-restore.mdx | 206 ++++++++++-------- 1 file changed, 119 insertions(+), 87 deletions(-) diff --git a/product_docs/docs/pgd/6/reference/backup-restore.mdx b/product_docs/docs/pgd/6/reference/backup-restore.mdx index 57de6215c8e..2ef48e7f554 100644 --- a/product_docs/docs/pgd/6/reference/backup-restore.mdx +++ b/product_docs/docs/pgd/6/reference/backup-restore.mdx @@ -28,6 +28,38 @@ recovery (DR), such as in the following situations: You can use pg_dump, sometimes referred to as *logical backup*, normally with PGD. +In order to reduce the risk of global lock timeouts, we recommend +dumping pre-data, data, and post-data separately. For example: + +```console +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -f pgd-pre-data.sql +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -f pgd-data.sql +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -f pgd-post-data.sql +``` + +And restore by directly executing these SQL files: + +```console +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-pre-data.sql +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-data.sql +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-post-data.sql +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +``` + +After which point the dump will be restored on all nodes in the cluster. + +In contrast if you do not split sections out with a naive pg_dump and +pg_restore, the restore will likely fail with a global lock timeout. + +You should also: + +- Make sure you have enough disk space for the WAL which may temporarily build up during initial replication if large transactions are run +- Consider increasing `bdr.global_lock_timeout` toward infinity if you continue to get lock timeouts +- Consider setting `bdr.ddl_locking` to `off` while the initial load is happening + +#### Sequences + pg_dump dumps both local and global sequences as if they were local sequences. This behavior is intentional, to allow a PGD schema to be dumped and ported to other PostgreSQL databases. @@ -51,7 +83,7 @@ dump only with `bdr.crdt_raw_value = on`. Technical Support recommends the use of physical backup techniques for backup and recovery of PGD. -### Physical backup +## Physical backup and restore You can take physical backups of a node in an EDB Postgres Distributed cluster using standard PostgreSQL software, such as @@ -82,6 +114,91 @@ PostgreSQL backup techniques to PGD: local data and a backup of at least one node that subscribes to each replication set. +### Restore + +While you can take a physical backup with the same procedure as a +standard PostgreSQL node, it's slightly more complex to +restore the physical backup of a PGD node. + +### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup + +The most common use case for restoring a physical backup involves the failure +or replacement of all the PGD nodes in a cluster, for instance in the event of +a data center failure. + +You might also want to perform this procedure to clone the current contents of a +EDB Postgres Distributed cluster to seed a QA or development instance. + +In that case, you can restore PGD capabilities based on a physical backup +of a single PGD node, optionally plus WAL archives: + +- If you still have some PGD nodes live and running, fence off the host you + restored the PGD node to, so it can't connect to any surviving PGD nodes. + This practice ensures that the new node doesn't confuse the existing cluster. +- Restore a single PostgreSQL node from a physical backup of one of + the PGD nodes. +- If you have WAL archives associated with the backup, create a suitable + `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest + state. You can specify an alternative `recovery_target` here if needed. +- Start the restored node, or promote it to read/write if it was in standby + recovery. Keep it fenced from any surviving nodes! +- Clean up any leftover PGD metadata that was included in the physical backup. +- Fully stop and restart the PostgreSQL instance. +- Add further PGD nodes with the standard procedure based on the + `bdr.join_node_group()` function call. + +#### Cleanup of PGD metadata + +To clean up leftover PGD metadata: + +1. Drop the PGD node using [`bdr.drop_node`](/pgd/6/reference/tables-views-functions/functions-internal#bdrdrop_node). +2. Fully stop and restart PostgreSQL (important!). + +#### Cleanup of replication origins + +You must explicitly remove replication origins with a separate step +because they're recorded persistently in a system catalog. They're +therefore included in the backup and in the restored instance. They +aren't removed automatically when dropping the BDR extension because +they aren't explicitly recorded as its dependencies. + +To track progress of incoming replication in a crash-safe way, +PGD creates one replication origin for each remote master node. Therefore, +for each node in the previous cluster run this once: + +``` +SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); +``` + +You can list replication origins as follows: + +``` +SELECT * FROM pg_replication_origin; +``` + +Those created by PGD are easily recognized by their name. + +#### Cleanup of replication slots + +If a physical backup was created with `pg_basebackup`, replication slots +are omitted from the backup. + +Some other backup methods might preserve replications slots, likely in +outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: + +``` +SELECT pg_drop_replication_slot(slot_name) +FROM pg_replication_slots; +``` + +If you have a reason to preserve some slots, +you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely +useful. + +!!! Warning + Never use these commands to drop replication slots on a live PGD node + + ### Eventual consistency The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not @@ -199,89 +316,4 @@ of changes arriving from a single master in COMMIT order. !!! Note This feature is available only with EDB Postgres Extended. - Barman doesn't create a `multi_recovery.conf` file. - -## Restore - -While you can take a physical backup with the same procedure as a -standard PostgreSQL node, it's slightly more complex to -restore the physical backup of a PGD node. - -### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup - -The most common use case for restoring a physical backup involves the failure -or replacement of all the PGD nodes in a cluster, for instance in the event of -a data center failure. - -You might also want to perform this procedure to clone the current contents of a -EDB Postgres Distributed cluster to seed a QA or development instance. - -In that case, you can restore PGD capabilities based on a physical backup -of a single PGD node, optionally plus WAL archives: - -- If you still have some PGD nodes live and running, fence off the host you - restored the PGD node to, so it can't connect to any surviving PGD nodes. - This practice ensures that the new node doesn't confuse the existing cluster. -- Restore a single PostgreSQL node from a physical backup of one of - the PGD nodes. -- If you have WAL archives associated with the backup, create a suitable - `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest - state. You can specify an alternative `recovery_target` here if needed. -- Start the restored node, or promote it to read/write if it was in standby - recovery. Keep it fenced from any surviving nodes! -- Clean up any leftover PGD metadata that was included in the physical backup. -- Fully stop and restart the PostgreSQL instance. -- Add further PGD nodes with the standard procedure based on the - `bdr.join_node_group()` function call. - -#### Cleanup of PGD metadata - -To clean up leftover PGD metadata: - -1. Drop the PGD node using [`bdr.drop_node`](/pgd/6/reference/tables-views-functions/functions-internal#bdrdrop_node). -2. Fully stop and restart PostgreSQL (important!). - -#### Cleanup of replication origins - -You must explicitly remove replication origins with a separate step -because they're recorded persistently in a system catalog. They're -therefore included in the backup and in the restored instance. They -aren't removed automatically when dropping the BDR extension because -they aren't explicitly recorded as its dependencies. - -To track progress of incoming replication in a crash-safe way, -PGD creates one replication origin for each remote master node. Therefore, -for each node in the previous cluster run this once: - -``` -SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); -``` - -You can list replication origins as follows: - -``` -SELECT * FROM pg_replication_origin; -``` - -Those created by PGD are easily recognized by their name. - -#### Cleanup of replication slots - -If a physical backup was created with `pg_basebackup`, replication slots -are omitted from the backup. - -Some other backup methods might preserve replications slots, likely in -outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: - -``` -SELECT pg_drop_replication_slot(slot_name) -FROM pg_replication_slots; -``` - -If you have a reason to preserve some slots, -you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely -useful. - -!!! Warning - Never use these commands to drop replication slots on a live PGD node - + Barman doesn't create a `multi_recovery.conf` file. \ No newline at end of file From b47579d91c4be8c0059aaa6d648d2cdbc4fbced2 Mon Sep 17 00:00:00 2001 From: Phil Eaton Date: Tue, 19 Aug 2025 19:56:29 -0400 Subject: [PATCH 2/7] up --- product_docs/docs/pgd/6/reference/backup-restore.mdx | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/product_docs/docs/pgd/6/reference/backup-restore.mdx b/product_docs/docs/pgd/6/reference/backup-restore.mdx index 2ef48e7f554..72360a7cfdf 100644 --- a/product_docs/docs/pgd/6/reference/backup-restore.mdx +++ b/product_docs/docs/pgd/6/reference/backup-restore.mdx @@ -26,10 +26,9 @@ recovery (DR), such as in the following situations: ### pg_dump You can use pg_dump, sometimes referred to as *logical backup*, -normally with PGD. - -In order to reduce the risk of global lock timeouts, we recommend -dumping pre-data, data, and post-data separately. For example: +normally with PGD. But in order to reduce the risk of global lock +timeouts, we recommend dumping pre-data, data, and post-data +separately. For example: ```console pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -f pgd-pre-data.sql From a6a5fff6bdad5947c38262232651a439082938f4 Mon Sep 17 00:00:00 2001 From: Phil Eaton Date: Tue, 19 Aug 2025 20:01:22 -0400 Subject: [PATCH 3/7] prefer node setup --- .../docs/pgd/6/reference/backup-restore.mdx | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/product_docs/docs/pgd/6/reference/backup-restore.mdx b/product_docs/docs/pgd/6/reference/backup-restore.mdx index 72360a7cfdf..76a24906817 100644 --- a/product_docs/docs/pgd/6/reference/backup-restore.mdx +++ b/product_docs/docs/pgd/6/reference/backup-restore.mdx @@ -21,9 +21,7 @@ recovery (DR), such as in the following situations: as a result of data corruption, application error, or security breach -## Backup - -### pg_dump +## Logical backup and restore You can use pg_dump, sometimes referred to as *logical backup*, normally with PGD. But in order to reduce the risk of global lock @@ -57,7 +55,14 @@ You should also: - Consider increasing `bdr.global_lock_timeout` toward infinity if you continue to get lock timeouts - Consider setting `bdr.ddl_locking` to `off` while the initial load is happening -#### Sequences +### Prefer restoring to a single node + +Especially when initially setting up a cluster from a Postgres dump, +we recommend you restore to a cluster with a single PGD node. Then run +`pgd node setup` for each node you want in the cluster which will do a +physical join that uses `bdr_init_physical` under the hood. + +### Sequences pg_dump dumps both local and global sequences as if they were local sequences. This behavior is intentional, to allow a PGD @@ -119,7 +124,7 @@ While you can take a physical backup with the same procedure as a standard PostgreSQL node, it's slightly more complex to restore the physical backup of a PGD node. -### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup +#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup The most common use case for restoring a physical backup involves the failure or replacement of all the PGD nodes in a cluster, for instance in the event of @@ -198,7 +203,7 @@ useful. Never use these commands to drop replication slots on a live PGD node -### Eventual consistency +## Eventual consistency The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not *entirely consistent*. A physical backup of a given node provides From 974f724d4f9a02d40ecf4d38518f1a9db191e45a Mon Sep 17 00:00:00 2001 From: Phil Eaton Date: Wed, 20 Aug 2025 16:15:06 -0400 Subject: [PATCH 4/7] More suggestions --- .../docs/pgd/6/reference/backup-restore.mdx | 33 ++++++++++++++++--- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/product_docs/docs/pgd/6/reference/backup-restore.mdx b/product_docs/docs/pgd/6/reference/backup-restore.mdx index 76a24906817..46175391af7 100644 --- a/product_docs/docs/pgd/6/reference/backup-restore.mdx +++ b/product_docs/docs/pgd/6/reference/backup-restore.mdx @@ -49,11 +49,36 @@ After which point the dump will be restored on all nodes in the cluster. In contrast if you do not split sections out with a naive pg_dump and pg_restore, the restore will likely fail with a global lock timeout. -You should also: +You should also temporarily set the following settings in `postgresql.conf`: -- Make sure you have enough disk space for the WAL which may temporarily build up during initial replication if large transactions are run -- Consider increasing `bdr.global_lock_timeout` toward infinity if you continue to get lock timeouts -- Consider setting `bdr.ddl_locking` to `off` while the initial load is happening +``` +# Increase from the default of `1GB` to something large, but still a +# fraction of your disk space since the non-WAL data must also fit. +# This decreases the frequency of checkpoints. +max_wal_size = 100GB + +# Increase the number of writers to make better use of parallel +# apply. Default is 2. Make sure this isn't overriden lower by the +# node group config num_writers setting. +bdr.writers_per_subscription = 5 + +# Increase the amount of memory for building indexes. Default is +# 64MB. For example, 1GB assuming 128GB total RAM. +maintenance_work_mem = 1GB + +# Increase the receiver and sender timeout from 1 minute to 1hr to +# allow large transactions through. +wal_receiver_timeout = 1h +wal_sender_timeout = 1h +``` + +Additionally: + +- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed. +- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general. + +And if you continue to get global lock timeouts during initial load, +temporarily set `bdr.ddl_locking = off` for the initial load. ### Prefer restoring to a single node From 76e8d25df05e89746db0a20ea9bbff76ef2e8253 Mon Sep 17 00:00:00 2001 From: Phil Eaton Date: Thu, 28 Aug 2025 19:43:11 -0400 Subject: [PATCH 5/7] pg_restore --- product_docs/docs/pgd/6/reference/backup-restore.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/product_docs/docs/pgd/6/reference/backup-restore.mdx b/product_docs/docs/pgd/6/reference/backup-restore.mdx index 46175391af7..fbe30c7af88 100644 --- a/product_docs/docs/pgd/6/reference/backup-restore.mdx +++ b/product_docs/docs/pgd/6/reference/backup-restore.mdx @@ -29,18 +29,18 @@ timeouts, we recommend dumping pre-data, data, and post-data separately. For example: ```console -pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -f pgd-pre-data.sql -pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -f pgd-data.sql -pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -f pgd-post-data.sql +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump ``` And restore by directly executing these SQL files: ```console -psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-pre-data.sql -psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-data.sql +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' -psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-post-data.sql +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' ``` From 4b8237ee5b49e7c6ce514be1b494f24b7fe683e8 Mon Sep 17 00:00:00 2001 From: Phil Eaton Date: Thu, 28 Aug 2025 19:54:16 -0400 Subject: [PATCH 6/7] Add backup notes for other versions --- product_docs/docs/pgd/5.6/backup.mdx | 3 +- product_docs/docs/pgd/5.7/backup.mdx | 3 +- product_docs/docs/pgd/5.8/backup.mdx | 3 +- product_docs/docs/pgd/5.9/backup.mdx | 3 +- .../docs/pgd/6.1/reference/backup-restore.mdx | 246 +++++++++++------- 5 files changed, 157 insertions(+), 101 deletions(-) diff --git a/product_docs/docs/pgd/5.6/backup.mdx b/product_docs/docs/pgd/5.6/backup.mdx index 3c20928da55..9daae22551d 100644 --- a/product_docs/docs/pgd/5.6/backup.mdx +++ b/product_docs/docs/pgd/5.6/backup.mdx @@ -278,5 +278,4 @@ you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely useful. !!! Warning - Never use these commands to drop replication slots on a live PGD node - + Never use these commands to drop replication slots on a live PGD node \ No newline at end of file diff --git a/product_docs/docs/pgd/5.7/backup.mdx b/product_docs/docs/pgd/5.7/backup.mdx index 861e4915008..6d345969b66 100644 --- a/product_docs/docs/pgd/5.7/backup.mdx +++ b/product_docs/docs/pgd/5.7/backup.mdx @@ -278,5 +278,4 @@ you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely useful. !!! Warning - Never use these commands to drop replication slots on a live PGD node - + Never use these commands to drop replication slots on a live PGD node \ No newline at end of file diff --git a/product_docs/docs/pgd/5.8/backup.mdx b/product_docs/docs/pgd/5.8/backup.mdx index 9ef2f85add1..c8c182b6bc0 100644 --- a/product_docs/docs/pgd/5.8/backup.mdx +++ b/product_docs/docs/pgd/5.8/backup.mdx @@ -278,5 +278,4 @@ you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely useful. !!! Warning - Never use these commands to drop replication slots on a live PGD node - + Never use these commands to drop replication slots on a live PGD node \ No newline at end of file diff --git a/product_docs/docs/pgd/5.9/backup.mdx b/product_docs/docs/pgd/5.9/backup.mdx index f8a44f1a740..664f48c6696 100644 --- a/product_docs/docs/pgd/5.9/backup.mdx +++ b/product_docs/docs/pgd/5.9/backup.mdx @@ -278,5 +278,4 @@ you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely useful. !!! Warning - Never use these commands to drop replication slots on a live PGD node - + Never use these commands to drop replication slots on a live PGD node \ No newline at end of file diff --git a/product_docs/docs/pgd/6.1/reference/backup-restore.mdx b/product_docs/docs/pgd/6.1/reference/backup-restore.mdx index 6d309fdb711..c8729b98d94 100644 --- a/product_docs/docs/pgd/6.1/reference/backup-restore.mdx +++ b/product_docs/docs/pgd/6.1/reference/backup-restore.mdx @@ -8,7 +8,6 @@ redirects: - /pgd/latest/backup/ #generated for DOCS-1247-PGD-6.0-Docs --- - PGD is designed to be a distributed, highly available system. If one or more nodes of a cluster are lost, the best way to replace them is to clone new nodes directly from the remaining nodes. @@ -21,12 +20,73 @@ recovery (DR), such as in the following situations: as a result of data corruption, application error, or security breach -## Backup - -### pg_dump +## Logical backup and restore You can use pg_dump, sometimes referred to as *logical backup*, -normally with PGD. +normally with PGD. But in order to reduce the risk of global lock +timeouts, we recommend dumping pre-data, data, and post-data +separately. For example: + +```console +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump +``` + +And restore by directly executing these SQL files: + +```console +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +``` + +After which point the dump will be restored on all nodes in the cluster. + +In contrast if you do not split sections out with a naive pg_dump and +pg_restore, the restore will likely fail with a global lock timeout. + +You should also temporarily set the following settings in `postgresql.conf`: + +``` +# Increase from the default of `1GB` to something large, but still a +# fraction of your disk space since the non-WAL data must also fit. +# This decreases the frequency of checkpoints. +max_wal_size = 100GB + +# Increase the number of writers to make better use of parallel +# apply. Default is 2. Make sure this isn't overriden lower by the +# node group config num_writers setting. +bdr.writers_per_subscription = 5 + +# Increase the amount of memory for building indexes. Default is +# 64MB. For example, 1GB assuming 128GB total RAM. +maintenance_work_mem = 1GB + +# Increase the receiver and sender timeout from 1 minute to 1hr to +# allow large transactions through. +wal_receiver_timeout = 1h +wal_sender_timeout = 1h +``` + +Additionally: + +- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed. +- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general. + +And if you continue to get global lock timeouts during initial load, +temporarily set `bdr.ddl_locking = off` for the initial load. + +### Prefer restoring to a single node + +Especially when initially setting up a cluster from a Postgres dump, +we recommend you restore to a cluster with a single PGD node. Then run +`pgd node setup` for each node you want in the cluster which will do a +physical join that uses `bdr_init_physical` under the hood. + +### Sequences pg_dump dumps both local and global sequences as if they were local sequences. This behavior is intentional, to allow a PGD @@ -51,7 +111,7 @@ dump only with `bdr.crdt_raw_value = on`. Technical Support recommends the use of physical backup techniques for backup and recovery of PGD. -### Physical backup +## Physical backup and restore You can take physical backups of a node in an EDB Postgres Distributed cluster using standard PostgreSQL software, such as @@ -82,7 +142,92 @@ PostgreSQL backup techniques to PGD: local data and a backup of at least one node that subscribes to each replication set. -### Eventual consistency +### Restore + +While you can take a physical backup with the same procedure as a +standard PostgreSQL node, it's slightly more complex to +restore the physical backup of a PGD node. + +#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup + +The most common use case for restoring a physical backup involves the failure +or replacement of all the PGD nodes in a cluster, for instance in the event of +a data center failure. + +You might also want to perform this procedure to clone the current contents of a +EDB Postgres Distributed cluster to seed a QA or development instance. + +In that case, you can restore PGD capabilities based on a physical backup +of a single PGD node, optionally plus WAL archives: + +- If you still have some PGD nodes live and running, fence off the host you + restored the PGD node to, so it can't connect to any surviving PGD nodes. + This practice ensures that the new node doesn't confuse the existing cluster. +- Restore a single PostgreSQL node from a physical backup of one of + the PGD nodes. +- If you have WAL archives associated with the backup, create a suitable + `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest + state. You can specify an alternative `recovery_target` here if needed. +- Start the restored node, or promote it to read/write if it was in standby + recovery. Keep it fenced from any surviving nodes! +- Clean up any leftover PGD metadata that was included in the physical backup. +- Fully stop and restart the PostgreSQL instance. +- Add further PGD nodes with the standard procedure based on the + `bdr.join_node_group()` function call. + +#### Cleanup of PGD metadata + +To clean up leftover PGD metadata: + +1. Drop the PGD node using [`bdr.drop_node`](/pgd/6.1/reference/tables-views-functions/functions-internal#bdrdrop_node). +2. Fully stop and restart PostgreSQL (important!). + +#### Cleanup of replication origins + +You must explicitly remove replication origins with a separate step +because they're recorded persistently in a system catalog. They're +therefore included in the backup and in the restored instance. They +aren't removed automatically when dropping the BDR extension because +they aren't explicitly recorded as its dependencies. + +To track progress of incoming replication in a crash-safe way, +PGD creates one replication origin for each remote master node. Therefore, +for each node in the previous cluster run this once: + +``` +SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); +``` + +You can list replication origins as follows: + +``` +SELECT * FROM pg_replication_origin; +``` + +Those created by PGD are easily recognized by their name. + +#### Cleanup of replication slots + +If a physical backup was created with `pg_basebackup`, replication slots +are omitted from the backup. + +Some other backup methods might preserve replications slots, likely in +outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: + +``` +SELECT pg_drop_replication_slot(slot_name) +FROM pg_replication_slots; +``` + +If you have a reason to preserve some slots, +you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely +useful. + +!!! Warning + Never use these commands to drop replication slots on a live PGD node + + +## Eventual consistency The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not *entirely consistent*. A physical backup of a given node provides @@ -199,89 +344,4 @@ of changes arriving from a single master in COMMIT order. !!! Note This feature is available only with EDB Postgres Extended. - Barman doesn't create a `multi_recovery.conf` file. - -## Restore - -While you can take a physical backup with the same procedure as a -standard PostgreSQL node, it's slightly more complex to -restore the physical backup of a PGD node. - -### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup - -The most common use case for restoring a physical backup involves the failure -or replacement of all the PGD nodes in a cluster, for instance in the event of -a data center failure. - -You might also want to perform this procedure to clone the current contents of a -EDB Postgres Distributed cluster to seed a QA or development instance. - -In that case, you can restore PGD capabilities based on a physical backup -of a single PGD node, optionally plus WAL archives: - -- If you still have some PGD nodes live and running, fence off the host you - restored the PGD node to, so it can't connect to any surviving PGD nodes. - This practice ensures that the new node doesn't confuse the existing cluster. -- Restore a single PostgreSQL node from a physical backup of one of - the PGD nodes. -- If you have WAL archives associated with the backup, create a suitable - `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest - state. You can specify an alternative `recovery_target` here if needed. -- Start the restored node, or promote it to read/write if it was in standby - recovery. Keep it fenced from any surviving nodes! -- Clean up any leftover PGD metadata that was included in the physical backup. -- Fully stop and restart the PostgreSQL instance. -- Add further PGD nodes with the standard procedure based on the - `bdr.join_node_group()` function call. - -#### Cleanup of PGD metadata - -To clean up leftover PGD metadata: - -1. Drop the PGD node using [`bdr.drop_node`](/pgd/latest/reference/tables-views-functions/functions-internal#bdrdrop_node). -2. Fully stop and restart PostgreSQL (important!). - -#### Cleanup of replication origins - -You must explicitly remove replication origins with a separate step -because they're recorded persistently in a system catalog. They're -therefore included in the backup and in the restored instance. They -aren't removed automatically when dropping the BDR extension because -they aren't explicitly recorded as its dependencies. - -To track progress of incoming replication in a crash-safe way, -PGD creates one replication origin for each remote master node. Therefore, -for each node in the previous cluster run this once: - -``` -SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); -``` - -You can list replication origins as follows: - -``` -SELECT * FROM pg_replication_origin; -``` - -Those created by PGD are easily recognized by their name. - -#### Cleanup of replication slots - -If a physical backup was created with `pg_basebackup`, replication slots -are omitted from the backup. - -Some other backup methods might preserve replications slots, likely in -outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: - -``` -SELECT pg_drop_replication_slot(slot_name) -FROM pg_replication_slots; -``` - -If you have a reason to preserve some slots, -you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely -useful. - -!!! Warning - Never use these commands to drop replication slots on a live PGD node - + Barman doesn't create a `multi_recovery.conf` file. \ No newline at end of file From 43314782d02b8b4d188589250fcd199fa6142070 Mon Sep 17 00:00:00 2001 From: Phil Eaton Date: Thu, 28 Aug 2025 19:57:08 -0400 Subject: [PATCH 7/7] Add backup notes for other versions --- product_docs/docs/pgd/5.6/backup.mdx | 262 +++++++++++------- product_docs/docs/pgd/5.7/backup.mdx | 262 +++++++++++------- product_docs/docs/pgd/5.8/backup.mdx | 262 +++++++++++------- product_docs/docs/pgd/5.9/backup.mdx | 262 +++++++++++------- .../docs/pgd/6/reference/backup-restore.mdx | 1 - 5 files changed, 648 insertions(+), 401 deletions(-) diff --git a/product_docs/docs/pgd/5.6/backup.mdx b/product_docs/docs/pgd/5.6/backup.mdx index 9daae22551d..bbb9739ee31 100644 --- a/product_docs/docs/pgd/5.6/backup.mdx +++ b/product_docs/docs/pgd/5.6/backup.mdx @@ -11,17 +11,78 @@ is to clone new nodes directly from the remaining nodes. The role of backup and recovery in PGD is to provide for disaster recovery (DR), such as in the following situations: -- Loss of all nodes in the cluster -- Significant, uncorrectable data corruption across multiple nodes +- Loss of all nodes in the cluster +- Significant, uncorrectable data corruption across multiple nodes as a result of data corruption, application error, or security breach -## Backup - -### pg_dump +## Logical backup and restore You can use pg_dump, sometimes referred to as *logical backup*, -normally with PGD. +normally with PGD. But in order to reduce the risk of global lock +timeouts, we recommend dumping pre-data, data, and post-data +separately. For example: + +```console +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump +``` + +And restore by directly executing these SQL files: + +```console +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +``` + +After which point the dump will be restored on all nodes in the cluster. + +In contrast if you do not split sections out with a naive pg_dump and +pg_restore, the restore will likely fail with a global lock timeout. + +You should also temporarily set the following settings in `postgresql.conf`: + +``` +# Increase from the default of `1GB` to something large, but still a +# fraction of your disk space since the non-WAL data must also fit. +# This decreases the frequency of checkpoints. +max_wal_size = 100GB + +# Increase the number of writers to make better use of parallel +# apply. Default is 2. Make sure this isn't overriden lower by the +# node group config num_writers setting. +bdr.writers_per_subscription = 5 + +# Increase the amount of memory for building indexes. Default is +# 64MB. For example, 1GB assuming 128GB total RAM. +maintenance_work_mem = 1GB + +# Increase the receiver and sender timeout from 1 minute to 1hr to +# allow large transactions through. +wal_receiver_timeout = 1h +wal_sender_timeout = 1h +``` + +Additionally: + +- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed. +- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general. + +And if you continue to get global lock timeouts during initial load, +temporarily set `bdr.ddl_locking = off` for the initial load. + +### Prefer restoring to a single node + +Especially when initially setting up a cluster from a Postgres dump, +we recommend you restore to a cluster with a single PGD node. Then run +`pgd node setup` for each node you want in the cluster which will do a +physical join that uses `bdr_init_physical` under the hood. + +### Sequences pg_dump dumps both local and global sequences as if they were local sequences. This behavior is intentional, to allow a PGD @@ -46,7 +107,7 @@ dump only with `bdr.crdt_raw_value = on`. Technical Support recommends the use of physical backup techniques for backup and recovery of PGD. -### Physical backup +## Physical backup and restore You can take physical backups of a node in an EDB Postgres Distributed cluster using standard PostgreSQL software, such as @@ -59,25 +120,110 @@ PostgreSQL node running the BDR extension. Consider these specific points when applying PostgreSQL backup techniques to PGD: -- PGD operates at the level of a single database, while a physical +- PGD operates at the level of a single database, while a physical backup includes all the databases in the instance. Plan your databases to allow them to be easily backed up and restored. -- Backups make a copy of just one node. In the simplest case, +- Backups make a copy of just one node. In the simplest case, every node has a copy of all data, so you need to back up only one node to capture all data. However, the goal of PGD isn't met if the site containing that single copy goes down, so the minimum is at least one node backup per site (with many copies, and so on). -- However, each node might have unreplicated local data, or the +- However, each node might have unreplicated local data, or the definition of replication sets might be complex so that all nodes don't subscribe to all replication sets. In these cases, backup planning must also include plans for how to back up any unreplicated local data and a backup of at least one node that subscribes to each replication set. -### Eventual consistency +### Restore + +While you can take a physical backup with the same procedure as a +standard PostgreSQL node, it's slightly more complex to +restore the physical backup of a PGD node. + +#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup + +The most common use case for restoring a physical backup involves the failure +or replacement of all the PGD nodes in a cluster, for instance in the event of +a data center failure. + +You might also want to perform this procedure to clone the current contents of a +EDB Postgres Distributed cluster to seed a QA or development instance. + +In that case, you can restore PGD capabilities based on a physical backup +of a single PGD node, optionally plus WAL archives: + +- If you still have some PGD nodes live and running, fence off the host you + restored the PGD node to, so it can't connect to any surviving PGD nodes. + This practice ensures that the new node doesn't confuse the existing cluster. +- Restore a single PostgreSQL node from a physical backup of one of + the PGD nodes. +- If you have WAL archives associated with the backup, create a suitable + `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest + state. You can specify an alternative `recovery_target` here if needed. +- Start the restored node, or promote it to read/write if it was in standby + recovery. Keep it fenced from any surviving nodes! +- Clean up any leftover PGD metadata that was included in the physical backup. +- Fully stop and restart the PostgreSQL instance. +- Add further PGD nodes with the standard procedure based on the + `bdr.join_node_group()` function call. + +#### Cleanup of PGD metadata + +To clean up leftover PGD metadata: + +1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.6/reference/functions-internal#bdrdrop_node). +2. Fully stop and restart PostgreSQL (important!). + +#### Cleanup of replication origins + +You must explicitly remove replication origins with a separate step +because they're recorded persistently in a system catalog. They're +therefore included in the backup and in the restored instance. They +aren't removed automatically when dropping the BDR extension because +they aren't explicitly recorded as its dependencies. + +To track progress of incoming replication in a crash-safe way, +PGD creates one replication origin for each remote master node. Therefore, +for each node in the previous cluster run this once: + +``` +SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); +``` + +You can list replication origins as follows: + +``` +SELECT * FROM pg_replication_origin; +``` + +Those created by PGD are easily recognized by their name. + +#### Cleanup of replication slots + +If a physical backup was created with `pg_basebackup`, replication slots +are omitted from the backup. + +Some other backup methods might preserve replications slots, likely in +outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: + +``` +SELECT pg_drop_replication_slot(slot_name) +FROM pg_replication_slots; +``` + +If you have a reason to preserve some slots, +you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely +useful. + +!!! Warning + Never use these commands to drop replication slots on a live PGD node + + +## Eventual consistency The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not *entirely consistent*. A physical backup of a given node provides @@ -125,7 +271,7 @@ replication origin. With PostgreSQL PITR, you can use the standard syntax: -``` +```text recovery_target_time = T1 ``` @@ -164,7 +310,7 @@ by `T1`, even though they weren't applied on `N1` until later. To request multi-origin PITR, use the standard syntax in the `postgresql.conf` file: -``` +```text recovery_target_time = T1 ``` @@ -172,13 +318,13 @@ You need to specify the list of replication origins that are restored to `T1` in You can use a separate `multi_recovery.conf` file by way of a new parameter, `recovery_target_origins`: -``` +```text recovery_target_origins = '*' ``` Or you can specify the origin subset as a list in `recovery_target_origins`: -``` +```text recovery_target_origins = '1,3' ``` @@ -194,88 +340,4 @@ of changes arriving from a single master in COMMIT order. !!! Note This feature is available only with EDB Postgres Extended. - Barman doesn't create a `multi_recovery.conf` file. - -## Restore - -While you can take a physical backup with the same procedure as a -standard PostgreSQL node, it's slightly more complex to -restore the physical backup of a PGD node. - -### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup - -The most common use case for restoring a physical backup involves the failure -or replacement of all the PGD nodes in a cluster, for instance in the event of -a data center failure. - -You might also want to perform this procedure to clone the current contents of a -EDB Postgres Distributed cluster to seed a QA or development instance. - -In that case, you can restore PGD capabilities based on a physical backup -of a single PGD node, optionally plus WAL archives: - -- If you still have some PGD nodes live and running, fence off the host you - restored the PGD node to, so it can't connect to any surviving PGD nodes. - This practice ensures that the new node doesn't confuse the existing cluster. -- Restore a single PostgreSQL node from a physical backup of one of - the PGD nodes. -- If you have WAL archives associated with the backup, create a suitable - `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest - state. You can specify an alternative `recovery_target` here if needed. -- Start the restored node, or promote it to read/write if it was in standby - recovery. Keep it fenced from any surviving nodes! -- Clean up any leftover PGD metadata that was included in the physical backup. -- Fully stop and restart the PostgreSQL instance. -- Add further PGD nodes with the standard procedure based on the - `bdr.join_node_group()` function call. - -#### Cleanup of PGD metadata - -To clean up leftover PGD metadata: - -1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.6/reference/functions-internal#bdrdrop_node). -2. Fully stop and restart PostgreSQL (important!). - -#### Cleanup of replication origins - -You must explicitly remove replication origins with a separate step -because they're recorded persistently in a system catalog. They're -therefore included in the backup and in the restored instance. They -aren't removed automatically when dropping the BDR extension because -they aren't explicitly recorded as its dependencies. - -To track progress of incoming replication in a crash-safe way, -PGD creates one replication origin for each remote master node. Therefore, -for each node in the previous cluster run this once: - -``` -SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); -``` - -You can list replication origins as follows: - -``` -SELECT * FROM pg_replication_origin; -``` - -Those created by PGD are easily recognized by their name. - -#### Cleanup of replication slots - -If a physical backup was created with `pg_basebackup`, replication slots -are omitted from the backup. - -Some other backup methods might preserve replications slots, likely in -outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: - -``` -SELECT pg_drop_replication_slot(slot_name) -FROM pg_replication_slots; -``` - -If you have a reason to preserve some slots, -you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely -useful. - -!!! Warning - Never use these commands to drop replication slots on a live PGD node \ No newline at end of file + Barman doesn't create a `multi_recovery.conf` file. \ No newline at end of file diff --git a/product_docs/docs/pgd/5.7/backup.mdx b/product_docs/docs/pgd/5.7/backup.mdx index 6d345969b66..1750078a90c 100644 --- a/product_docs/docs/pgd/5.7/backup.mdx +++ b/product_docs/docs/pgd/5.7/backup.mdx @@ -11,17 +11,78 @@ is to clone new nodes directly from the remaining nodes. The role of backup and recovery in PGD is to provide for disaster recovery (DR), such as in the following situations: -- Loss of all nodes in the cluster -- Significant, uncorrectable data corruption across multiple nodes +- Loss of all nodes in the cluster +- Significant, uncorrectable data corruption across multiple nodes as a result of data corruption, application error, or security breach -## Backup - -### pg_dump +## Logical backup and restore You can use pg_dump, sometimes referred to as *logical backup*, -normally with PGD. +normally with PGD. But in order to reduce the risk of global lock +timeouts, we recommend dumping pre-data, data, and post-data +separately. For example: + +```console +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump +``` + +And restore by directly executing these SQL files: + +```console +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +``` + +After which point the dump will be restored on all nodes in the cluster. + +In contrast if you do not split sections out with a naive pg_dump and +pg_restore, the restore will likely fail with a global lock timeout. + +You should also temporarily set the following settings in `postgresql.conf`: + +``` +# Increase from the default of `1GB` to something large, but still a +# fraction of your disk space since the non-WAL data must also fit. +# This decreases the frequency of checkpoints. +max_wal_size = 100GB + +# Increase the number of writers to make better use of parallel +# apply. Default is 2. Make sure this isn't overriden lower by the +# node group config num_writers setting. +bdr.writers_per_subscription = 5 + +# Increase the amount of memory for building indexes. Default is +# 64MB. For example, 1GB assuming 128GB total RAM. +maintenance_work_mem = 1GB + +# Increase the receiver and sender timeout from 1 minute to 1hr to +# allow large transactions through. +wal_receiver_timeout = 1h +wal_sender_timeout = 1h +``` + +Additionally: + +- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed. +- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general. + +And if you continue to get global lock timeouts during initial load, +temporarily set `bdr.ddl_locking = off` for the initial load. + +### Prefer restoring to a single node + +Especially when initially setting up a cluster from a Postgres dump, +we recommend you restore to a cluster with a single PGD node. Then run +`pgd node setup` for each node you want in the cluster which will do a +physical join that uses `bdr_init_physical` under the hood. + +### Sequences pg_dump dumps both local and global sequences as if they were local sequences. This behavior is intentional, to allow a PGD @@ -46,7 +107,7 @@ dump only with `bdr.crdt_raw_value = on`. Technical Support recommends the use of physical backup techniques for backup and recovery of PGD. -### Physical backup +## Physical backup and restore You can take physical backups of a node in an EDB Postgres Distributed cluster using standard PostgreSQL software, such as @@ -59,25 +120,110 @@ PostgreSQL node running the BDR extension. Consider these specific points when applying PostgreSQL backup techniques to PGD: -- PGD operates at the level of a single database, while a physical +- PGD operates at the level of a single database, while a physical backup includes all the databases in the instance. Plan your databases to allow them to be easily backed up and restored. -- Backups make a copy of just one node. In the simplest case, +- Backups make a copy of just one node. In the simplest case, every node has a copy of all data, so you need to back up only one node to capture all data. However, the goal of PGD isn't met if the site containing that single copy goes down, so the minimum is at least one node backup per site (with many copies, and so on). -- However, each node might have unreplicated local data, or the +- However, each node might have unreplicated local data, or the definition of replication sets might be complex so that all nodes don't subscribe to all replication sets. In these cases, backup planning must also include plans for how to back up any unreplicated local data and a backup of at least one node that subscribes to each replication set. -### Eventual consistency +### Restore + +While you can take a physical backup with the same procedure as a +standard PostgreSQL node, it's slightly more complex to +restore the physical backup of a PGD node. + +#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup + +The most common use case for restoring a physical backup involves the failure +or replacement of all the PGD nodes in a cluster, for instance in the event of +a data center failure. + +You might also want to perform this procedure to clone the current contents of a +EDB Postgres Distributed cluster to seed a QA or development instance. + +In that case, you can restore PGD capabilities based on a physical backup +of a single PGD node, optionally plus WAL archives: + +- If you still have some PGD nodes live and running, fence off the host you + restored the PGD node to, so it can't connect to any surviving PGD nodes. + This practice ensures that the new node doesn't confuse the existing cluster. +- Restore a single PostgreSQL node from a physical backup of one of + the PGD nodes. +- If you have WAL archives associated with the backup, create a suitable + `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest + state. You can specify an alternative `recovery_target` here if needed. +- Start the restored node, or promote it to read/write if it was in standby + recovery. Keep it fenced from any surviving nodes! +- Clean up any leftover PGD metadata that was included in the physical backup. +- Fully stop and restart the PostgreSQL instance. +- Add further PGD nodes with the standard procedure based on the + `bdr.join_node_group()` function call. + +#### Cleanup of PGD metadata + +To clean up leftover PGD metadata: + +1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.7/reference/functions-internal#bdrdrop_node). +2. Fully stop and restart PostgreSQL (important!). + +#### Cleanup of replication origins + +You must explicitly remove replication origins with a separate step +because they're recorded persistently in a system catalog. They're +therefore included in the backup and in the restored instance. They +aren't removed automatically when dropping the BDR extension because +they aren't explicitly recorded as its dependencies. + +To track progress of incoming replication in a crash-safe way, +PGD creates one replication origin for each remote master node. Therefore, +for each node in the previous cluster run this once: + +``` +SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); +``` + +You can list replication origins as follows: + +``` +SELECT * FROM pg_replication_origin; +``` + +Those created by PGD are easily recognized by their name. + +#### Cleanup of replication slots + +If a physical backup was created with `pg_basebackup`, replication slots +are omitted from the backup. + +Some other backup methods might preserve replications slots, likely in +outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: + +``` +SELECT pg_drop_replication_slot(slot_name) +FROM pg_replication_slots; +``` + +If you have a reason to preserve some slots, +you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely +useful. + +!!! Warning + Never use these commands to drop replication slots on a live PGD node + + +## Eventual consistency The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not *entirely consistent*. A physical backup of a given node provides @@ -125,7 +271,7 @@ replication origin. With PostgreSQL PITR, you can use the standard syntax: -``` +```text recovery_target_time = T1 ``` @@ -164,7 +310,7 @@ by `T1`, even though they weren't applied on `N1` until later. To request multi-origin PITR, use the standard syntax in the `postgresql.conf` file: -``` +```text recovery_target_time = T1 ``` @@ -172,13 +318,13 @@ You need to specify the list of replication origins that are restored to `T1` in You can use a separate `multi_recovery.conf` file by way of a new parameter, `recovery_target_origins`: -``` +```text recovery_target_origins = '*' ``` Or you can specify the origin subset as a list in `recovery_target_origins`: -``` +```text recovery_target_origins = '1,3' ``` @@ -194,88 +340,4 @@ of changes arriving from a single master in COMMIT order. !!! Note This feature is available only with EDB Postgres Extended. - Barman doesn't create a `multi_recovery.conf` file. - -## Restore - -While you can take a physical backup with the same procedure as a -standard PostgreSQL node, it's slightly more complex to -restore the physical backup of a PGD node. - -### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup - -The most common use case for restoring a physical backup involves the failure -or replacement of all the PGD nodes in a cluster, for instance in the event of -a data center failure. - -You might also want to perform this procedure to clone the current contents of a -EDB Postgres Distributed cluster to seed a QA or development instance. - -In that case, you can restore PGD capabilities based on a physical backup -of a single PGD node, optionally plus WAL archives: - -- If you still have some PGD nodes live and running, fence off the host you - restored the PGD node to, so it can't connect to any surviving PGD nodes. - This practice ensures that the new node doesn't confuse the existing cluster. -- Restore a single PostgreSQL node from a physical backup of one of - the PGD nodes. -- If you have WAL archives associated with the backup, create a suitable - `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest - state. You can specify an alternative `recovery_target` here if needed. -- Start the restored node, or promote it to read/write if it was in standby - recovery. Keep it fenced from any surviving nodes! -- Clean up any leftover PGD metadata that was included in the physical backup. -- Fully stop and restart the PostgreSQL instance. -- Add further PGD nodes with the standard procedure based on the - `bdr.join_node_group()` function call. - -#### Cleanup of PGD metadata - -To clean up leftover PGD metadata: - -1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.7/reference/functions-internal#bdrdrop_node). -2. Fully stop and restart PostgreSQL (important!). - -#### Cleanup of replication origins - -You must explicitly remove replication origins with a separate step -because they're recorded persistently in a system catalog. They're -therefore included in the backup and in the restored instance. They -aren't removed automatically when dropping the BDR extension because -they aren't explicitly recorded as its dependencies. - -To track progress of incoming replication in a crash-safe way, -PGD creates one replication origin for each remote master node. Therefore, -for each node in the previous cluster run this once: - -``` -SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); -``` - -You can list replication origins as follows: - -``` -SELECT * FROM pg_replication_origin; -``` - -Those created by PGD are easily recognized by their name. - -#### Cleanup of replication slots - -If a physical backup was created with `pg_basebackup`, replication slots -are omitted from the backup. - -Some other backup methods might preserve replications slots, likely in -outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: - -``` -SELECT pg_drop_replication_slot(slot_name) -FROM pg_replication_slots; -``` - -If you have a reason to preserve some slots, -you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely -useful. - -!!! Warning - Never use these commands to drop replication slots on a live PGD node \ No newline at end of file + Barman doesn't create a `multi_recovery.conf` file. \ No newline at end of file diff --git a/product_docs/docs/pgd/5.8/backup.mdx b/product_docs/docs/pgd/5.8/backup.mdx index c8c182b6bc0..a101377088f 100644 --- a/product_docs/docs/pgd/5.8/backup.mdx +++ b/product_docs/docs/pgd/5.8/backup.mdx @@ -11,17 +11,78 @@ is to clone new nodes directly from the remaining nodes. The role of backup and recovery in PGD is to provide for disaster recovery (DR), such as in the following situations: -- Loss of all nodes in the cluster -- Significant, uncorrectable data corruption across multiple nodes +- Loss of all nodes in the cluster +- Significant, uncorrectable data corruption across multiple nodes as a result of data corruption, application error, or security breach -## Backup - -### pg_dump +## Logical backup and restore You can use pg_dump, sometimes referred to as *logical backup*, -normally with PGD. +normally with PGD. But in order to reduce the risk of global lock +timeouts, we recommend dumping pre-data, data, and post-data +separately. For example: + +```console +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump +``` + +And restore by directly executing these SQL files: + +```console +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +``` + +After which point the dump will be restored on all nodes in the cluster. + +In contrast if you do not split sections out with a naive pg_dump and +pg_restore, the restore will likely fail with a global lock timeout. + +You should also temporarily set the following settings in `postgresql.conf`: + +``` +# Increase from the default of `1GB` to something large, but still a +# fraction of your disk space since the non-WAL data must also fit. +# This decreases the frequency of checkpoints. +max_wal_size = 100GB + +# Increase the number of writers to make better use of parallel +# apply. Default is 2. Make sure this isn't overriden lower by the +# node group config num_writers setting. +bdr.writers_per_subscription = 5 + +# Increase the amount of memory for building indexes. Default is +# 64MB. For example, 1GB assuming 128GB total RAM. +maintenance_work_mem = 1GB + +# Increase the receiver and sender timeout from 1 minute to 1hr to +# allow large transactions through. +wal_receiver_timeout = 1h +wal_sender_timeout = 1h +``` + +Additionally: + +- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed. +- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general. + +And if you continue to get global lock timeouts during initial load, +temporarily set `bdr.ddl_locking = off` for the initial load. + +### Prefer restoring to a single node + +Especially when initially setting up a cluster from a Postgres dump, +we recommend you restore to a cluster with a single PGD node. Then run +`pgd node setup` for each node you want in the cluster which will do a +physical join that uses `bdr_init_physical` under the hood. + +### Sequences pg_dump dumps both local and global sequences as if they were local sequences. This behavior is intentional, to allow a PGD @@ -46,7 +107,7 @@ dump only with `bdr.crdt_raw_value = on`. Technical Support recommends the use of physical backup techniques for backup and recovery of PGD. -### Physical backup +## Physical backup and restore You can take physical backups of a node in an EDB Postgres Distributed cluster using standard PostgreSQL software, such as @@ -59,25 +120,110 @@ PostgreSQL node running the BDR extension. Consider these specific points when applying PostgreSQL backup techniques to PGD: -- PGD operates at the level of a single database, while a physical +- PGD operates at the level of a single database, while a physical backup includes all the databases in the instance. Plan your databases to allow them to be easily backed up and restored. -- Backups make a copy of just one node. In the simplest case, +- Backups make a copy of just one node. In the simplest case, every node has a copy of all data, so you need to back up only one node to capture all data. However, the goal of PGD isn't met if the site containing that single copy goes down, so the minimum is at least one node backup per site (with many copies, and so on). -- However, each node might have unreplicated local data, or the +- However, each node might have unreplicated local data, or the definition of replication sets might be complex so that all nodes don't subscribe to all replication sets. In these cases, backup planning must also include plans for how to back up any unreplicated local data and a backup of at least one node that subscribes to each replication set. -### Eventual consistency +### Restore + +While you can take a physical backup with the same procedure as a +standard PostgreSQL node, it's slightly more complex to +restore the physical backup of a PGD node. + +#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup + +The most common use case for restoring a physical backup involves the failure +or replacement of all the PGD nodes in a cluster, for instance in the event of +a data center failure. + +You might also want to perform this procedure to clone the current contents of a +EDB Postgres Distributed cluster to seed a QA or development instance. + +In that case, you can restore PGD capabilities based on a physical backup +of a single PGD node, optionally plus WAL archives: + +- If you still have some PGD nodes live and running, fence off the host you + restored the PGD node to, so it can't connect to any surviving PGD nodes. + This practice ensures that the new node doesn't confuse the existing cluster. +- Restore a single PostgreSQL node from a physical backup of one of + the PGD nodes. +- If you have WAL archives associated with the backup, create a suitable + `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest + state. You can specify an alternative `recovery_target` here if needed. +- Start the restored node, or promote it to read/write if it was in standby + recovery. Keep it fenced from any surviving nodes! +- Clean up any leftover PGD metadata that was included in the physical backup. +- Fully stop and restart the PostgreSQL instance. +- Add further PGD nodes with the standard procedure based on the + `bdr.join_node_group()` function call. + +#### Cleanup of PGD metadata + +To clean up leftover PGD metadata: + +1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.8/reference/functions-internal#bdrdrop_node). +2. Fully stop and restart PostgreSQL (important!). + +#### Cleanup of replication origins + +You must explicitly remove replication origins with a separate step +because they're recorded persistently in a system catalog. They're +therefore included in the backup and in the restored instance. They +aren't removed automatically when dropping the BDR extension because +they aren't explicitly recorded as its dependencies. + +To track progress of incoming replication in a crash-safe way, +PGD creates one replication origin for each remote master node. Therefore, +for each node in the previous cluster run this once: + +``` +SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); +``` + +You can list replication origins as follows: + +``` +SELECT * FROM pg_replication_origin; +``` + +Those created by PGD are easily recognized by their name. + +#### Cleanup of replication slots + +If a physical backup was created with `pg_basebackup`, replication slots +are omitted from the backup. + +Some other backup methods might preserve replications slots, likely in +outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: + +``` +SELECT pg_drop_replication_slot(slot_name) +FROM pg_replication_slots; +``` + +If you have a reason to preserve some slots, +you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely +useful. + +!!! Warning + Never use these commands to drop replication slots on a live PGD node + + +## Eventual consistency The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not *entirely consistent*. A physical backup of a given node provides @@ -125,7 +271,7 @@ replication origin. With PostgreSQL PITR, you can use the standard syntax: -``` +```text recovery_target_time = T1 ``` @@ -164,7 +310,7 @@ by `T1`, even though they weren't applied on `N1` until later. To request multi-origin PITR, use the standard syntax in the `postgresql.conf` file: -``` +```text recovery_target_time = T1 ``` @@ -172,13 +318,13 @@ You need to specify the list of replication origins that are restored to `T1` in You can use a separate `multi_recovery.conf` file by way of a new parameter, `recovery_target_origins`: -``` +```text recovery_target_origins = '*' ``` Or you can specify the origin subset as a list in `recovery_target_origins`: -``` +```text recovery_target_origins = '1,3' ``` @@ -194,88 +340,4 @@ of changes arriving from a single master in COMMIT order. !!! Note This feature is available only with EDB Postgres Extended. - Barman doesn't create a `multi_recovery.conf` file. - -## Restore - -While you can take a physical backup with the same procedure as a -standard PostgreSQL node, it's slightly more complex to -restore the physical backup of a PGD node. - -### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup - -The most common use case for restoring a physical backup involves the failure -or replacement of all the PGD nodes in a cluster, for instance in the event of -a data center failure. - -You might also want to perform this procedure to clone the current contents of a -EDB Postgres Distributed cluster to seed a QA or development instance. - -In that case, you can restore PGD capabilities based on a physical backup -of a single PGD node, optionally plus WAL archives: - -- If you still have some PGD nodes live and running, fence off the host you - restored the PGD node to, so it can't connect to any surviving PGD nodes. - This practice ensures that the new node doesn't confuse the existing cluster. -- Restore a single PostgreSQL node from a physical backup of one of - the PGD nodes. -- If you have WAL archives associated with the backup, create a suitable - `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest - state. You can specify an alternative `recovery_target` here if needed. -- Start the restored node, or promote it to read/write if it was in standby - recovery. Keep it fenced from any surviving nodes! -- Clean up any leftover PGD metadata that was included in the physical backup. -- Fully stop and restart the PostgreSQL instance. -- Add further PGD nodes with the standard procedure based on the - `bdr.join_node_group()` function call. - -#### Cleanup of PGD metadata - -To clean up leftover PGD metadata: - -1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.8/reference/functions-internal#bdrdrop_node). -2. Fully stop and restart PostgreSQL (important!). - -#### Cleanup of replication origins - -You must explicitly remove replication origins with a separate step -because they're recorded persistently in a system catalog. They're -therefore included in the backup and in the restored instance. They -aren't removed automatically when dropping the BDR extension because -they aren't explicitly recorded as its dependencies. - -To track progress of incoming replication in a crash-safe way, -PGD creates one replication origin for each remote master node. Therefore, -for each node in the previous cluster run this once: - -``` -SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); -``` - -You can list replication origins as follows: - -``` -SELECT * FROM pg_replication_origin; -``` - -Those created by PGD are easily recognized by their name. - -#### Cleanup of replication slots - -If a physical backup was created with `pg_basebackup`, replication slots -are omitted from the backup. - -Some other backup methods might preserve replications slots, likely in -outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: - -``` -SELECT pg_drop_replication_slot(slot_name) -FROM pg_replication_slots; -``` - -If you have a reason to preserve some slots, -you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely -useful. - -!!! Warning - Never use these commands to drop replication slots on a live PGD node \ No newline at end of file + Barman doesn't create a `multi_recovery.conf` file. \ No newline at end of file diff --git a/product_docs/docs/pgd/5.9/backup.mdx b/product_docs/docs/pgd/5.9/backup.mdx index 664f48c6696..2881831ad9b 100644 --- a/product_docs/docs/pgd/5.9/backup.mdx +++ b/product_docs/docs/pgd/5.9/backup.mdx @@ -11,17 +11,78 @@ is to clone new nodes directly from the remaining nodes. The role of backup and recovery in PGD is to provide for disaster recovery (DR), such as in the following situations: -- Loss of all nodes in the cluster -- Significant, uncorrectable data corruption across multiple nodes +- Loss of all nodes in the cluster +- Significant, uncorrectable data corruption across multiple nodes as a result of data corruption, application error, or security breach -## Backup - -### pg_dump +## Logical backup and restore You can use pg_dump, sometimes referred to as *logical backup*, -normally with PGD. +normally with PGD. But in order to reduce the risk of global lock +timeouts, we recommend dumping pre-data, data, and post-data +separately. For example: + +```console +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump +pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump +``` + +And restore by directly executing these SQL files: + +```console +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump +psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)' +``` + +After which point the dump will be restored on all nodes in the cluster. + +In contrast if you do not split sections out with a naive pg_dump and +pg_restore, the restore will likely fail with a global lock timeout. + +You should also temporarily set the following settings in `postgresql.conf`: + +``` +# Increase from the default of `1GB` to something large, but still a +# fraction of your disk space since the non-WAL data must also fit. +# This decreases the frequency of checkpoints. +max_wal_size = 100GB + +# Increase the number of writers to make better use of parallel +# apply. Default is 2. Make sure this isn't overriden lower by the +# node group config num_writers setting. +bdr.writers_per_subscription = 5 + +# Increase the amount of memory for building indexes. Default is +# 64MB. For example, 1GB assuming 128GB total RAM. +maintenance_work_mem = 1GB + +# Increase the receiver and sender timeout from 1 minute to 1hr to +# allow large transactions through. +wal_receiver_timeout = 1h +wal_sender_timeout = 1h +``` + +Additionally: + +- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed. +- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general. + +And if you continue to get global lock timeouts during initial load, +temporarily set `bdr.ddl_locking = off` for the initial load. + +### Prefer restoring to a single node + +Especially when initially setting up a cluster from a Postgres dump, +we recommend you restore to a cluster with a single PGD node. Then run +`pgd node setup` for each node you want in the cluster which will do a +physical join that uses `bdr_init_physical` under the hood. + +### Sequences pg_dump dumps both local and global sequences as if they were local sequences. This behavior is intentional, to allow a PGD @@ -46,7 +107,7 @@ dump only with `bdr.crdt_raw_value = on`. Technical Support recommends the use of physical backup techniques for backup and recovery of PGD. -### Physical backup +## Physical backup and restore You can take physical backups of a node in an EDB Postgres Distributed cluster using standard PostgreSQL software, such as @@ -59,25 +120,110 @@ PostgreSQL node running the BDR extension. Consider these specific points when applying PostgreSQL backup techniques to PGD: -- PGD operates at the level of a single database, while a physical +- PGD operates at the level of a single database, while a physical backup includes all the databases in the instance. Plan your databases to allow them to be easily backed up and restored. -- Backups make a copy of just one node. In the simplest case, +- Backups make a copy of just one node. In the simplest case, every node has a copy of all data, so you need to back up only one node to capture all data. However, the goal of PGD isn't met if the site containing that single copy goes down, so the minimum is at least one node backup per site (with many copies, and so on). -- However, each node might have unreplicated local data, or the +- However, each node might have unreplicated local data, or the definition of replication sets might be complex so that all nodes don't subscribe to all replication sets. In these cases, backup planning must also include plans for how to back up any unreplicated local data and a backup of at least one node that subscribes to each replication set. -### Eventual consistency +### Restore + +While you can take a physical backup with the same procedure as a +standard PostgreSQL node, it's slightly more complex to +restore the physical backup of a PGD node. + +#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup + +The most common use case for restoring a physical backup involves the failure +or replacement of all the PGD nodes in a cluster, for instance in the event of +a data center failure. + +You might also want to perform this procedure to clone the current contents of a +EDB Postgres Distributed cluster to seed a QA or development instance. + +In that case, you can restore PGD capabilities based on a physical backup +of a single PGD node, optionally plus WAL archives: + +- If you still have some PGD nodes live and running, fence off the host you + restored the PGD node to, so it can't connect to any surviving PGD nodes. + This practice ensures that the new node doesn't confuse the existing cluster. +- Restore a single PostgreSQL node from a physical backup of one of + the PGD nodes. +- If you have WAL archives associated with the backup, create a suitable + `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest + state. You can specify an alternative `recovery_target` here if needed. +- Start the restored node, or promote it to read/write if it was in standby + recovery. Keep it fenced from any surviving nodes! +- Clean up any leftover PGD metadata that was included in the physical backup. +- Fully stop and restart the PostgreSQL instance. +- Add further PGD nodes with the standard procedure based on the + `bdr.join_node_group()` function call. + +#### Cleanup of PGD metadata + +To clean up leftover PGD metadata: + +1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.9/reference/functions-internal#bdrdrop_node). +2. Fully stop and restart PostgreSQL (important!). + +#### Cleanup of replication origins + +You must explicitly remove replication origins with a separate step +because they're recorded persistently in a system catalog. They're +therefore included in the backup and in the restored instance. They +aren't removed automatically when dropping the BDR extension because +they aren't explicitly recorded as its dependencies. + +To track progress of incoming replication in a crash-safe way, +PGD creates one replication origin for each remote master node. Therefore, +for each node in the previous cluster run this once: + +``` +SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); +``` + +You can list replication origins as follows: + +``` +SELECT * FROM pg_replication_origin; +``` + +Those created by PGD are easily recognized by their name. + +#### Cleanup of replication slots + +If a physical backup was created with `pg_basebackup`, replication slots +are omitted from the backup. + +Some other backup methods might preserve replications slots, likely in +outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: + +``` +SELECT pg_drop_replication_slot(slot_name) +FROM pg_replication_slots; +``` + +If you have a reason to preserve some slots, +you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely +useful. + +!!! Warning + Never use these commands to drop replication slots on a live PGD node + + +## Eventual consistency The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not *entirely consistent*. A physical backup of a given node provides @@ -125,7 +271,7 @@ replication origin. With PostgreSQL PITR, you can use the standard syntax: -``` +```text recovery_target_time = T1 ``` @@ -164,7 +310,7 @@ by `T1`, even though they weren't applied on `N1` until later. To request multi-origin PITR, use the standard syntax in the `postgresql.conf` file: -``` +```text recovery_target_time = T1 ``` @@ -172,13 +318,13 @@ You need to specify the list of replication origins that are restored to `T1` in You can use a separate `multi_recovery.conf` file by way of a new parameter, `recovery_target_origins`: -``` +```text recovery_target_origins = '*' ``` Or you can specify the origin subset as a list in `recovery_target_origins`: -``` +```text recovery_target_origins = '1,3' ``` @@ -194,88 +340,4 @@ of changes arriving from a single master in COMMIT order. !!! Note This feature is available only with EDB Postgres Extended. - Barman doesn't create a `multi_recovery.conf` file. - -## Restore - -While you can take a physical backup with the same procedure as a -standard PostgreSQL node, it's slightly more complex to -restore the physical backup of a PGD node. - -### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup - -The most common use case for restoring a physical backup involves the failure -or replacement of all the PGD nodes in a cluster, for instance in the event of -a data center failure. - -You might also want to perform this procedure to clone the current contents of a -EDB Postgres Distributed cluster to seed a QA or development instance. - -In that case, you can restore PGD capabilities based on a physical backup -of a single PGD node, optionally plus WAL archives: - -- If you still have some PGD nodes live and running, fence off the host you - restored the PGD node to, so it can't connect to any surviving PGD nodes. - This practice ensures that the new node doesn't confuse the existing cluster. -- Restore a single PostgreSQL node from a physical backup of one of - the PGD nodes. -- If you have WAL archives associated with the backup, create a suitable - `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest - state. You can specify an alternative `recovery_target` here if needed. -- Start the restored node, or promote it to read/write if it was in standby - recovery. Keep it fenced from any surviving nodes! -- Clean up any leftover PGD metadata that was included in the physical backup. -- Fully stop and restart the PostgreSQL instance. -- Add further PGD nodes with the standard procedure based on the - `bdr.join_node_group()` function call. - -#### Cleanup of PGD metadata - -To clean up leftover PGD metadata: - -1. Drop the PGD node using [`bdr.drop_node`](/pgd/5.9/reference/functions-internal#bdrdrop_node). -2. Fully stop and restart PostgreSQL (important!). - -#### Cleanup of replication origins - -You must explicitly remove replication origins with a separate step -because they're recorded persistently in a system catalog. They're -therefore included in the backup and in the restored instance. They -aren't removed automatically when dropping the BDR extension because -they aren't explicitly recorded as its dependencies. - -To track progress of incoming replication in a crash-safe way, -PGD creates one replication origin for each remote master node. Therefore, -for each node in the previous cluster run this once: - -``` -SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename'); -``` - -You can list replication origins as follows: - -``` -SELECT * FROM pg_replication_origin; -``` - -Those created by PGD are easily recognized by their name. - -#### Cleanup of replication slots - -If a physical backup was created with `pg_basebackup`, replication slots -are omitted from the backup. - -Some other backup methods might preserve replications slots, likely in -outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots: - -``` -SELECT pg_drop_replication_slot(slot_name) -FROM pg_replication_slots; -``` - -If you have a reason to preserve some slots, -you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely -useful. - -!!! Warning - Never use these commands to drop replication slots on a live PGD node \ No newline at end of file + Barman doesn't create a `multi_recovery.conf` file. \ No newline at end of file diff --git a/product_docs/docs/pgd/6/reference/backup-restore.mdx b/product_docs/docs/pgd/6/reference/backup-restore.mdx index fbe30c7af88..fb508891b79 100644 --- a/product_docs/docs/pgd/6/reference/backup-restore.mdx +++ b/product_docs/docs/pgd/6/reference/backup-restore.mdx @@ -8,7 +8,6 @@ redirects: - /pgd/latest/backup/ #generated for DOCS-1247-PGD-6.0-Docs --- - PGD is designed to be a distributed, highly available system. If one or more nodes of a cluster are lost, the best way to replace them is to clone new nodes directly from the remaining nodes.