EnterpriseDB · eatonphil · Aug 19, 2025 · Aug 19, 2025 · Aug 20, 2025 · Aug 20, 2025
@@ -21,12 +21,73 @@ recovery (DR), such as in the following situations:
     as a result of data corruption, application error, or
     security breach
 
-## Backup
-
-### pg_dump
+## Logical backup and restore
 
 You can use pg_dump, sometimes referred to as *logical backup*,
-normally with PGD.
+normally with PGD. But in order to reduce the risk of global lock
+timeouts, we recommend dumping pre-data, data, and post-data
+separately. For example:
+
+```console
+pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -f pgd-pre-data.sql
+pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -f pgd-data.sql
+pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -f pgd-post-data.sql
+```
+
+And restore by directly executing these SQL files:
+
+```console
+psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-pre-data.sql
+psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-data.sql
+psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'
+psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -f pgd-post-data.sql
+psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'
+```
+
+After which point the dump will be restored on all nodes in the cluster.
+
+In contrast if you do not split sections out with a naive pg_dump and
+pg_restore, the restore will likely fail with a global lock timeout.
+
+You should also temporarily set the following settings in `postgresql.conf`:
+
+```
+# Increase from the default of `1GB` to something large, but still a
+# fraction of your disk space since the non-WAL data must also fit.
+# This decreases the frequency of checkpoints.
+max_wal_size = 100GB
+
+# Increase the number of writers to make better use of parallel
+# apply. Default is 2. Make sure this isn't overriden lower by the
+# node group config num_writers setting.
+bdr.writers_per_subscription = 5
+
+# Increase the amount of memory for building indexes. Default is
+# 64MB. For example, 1GB assuming 128GB total RAM.
+maintenance_work_mem = 1GB
+
+# Increase the receiver and sender timeout from 1 minute to 1hr to
+# allow large transactions through.
+wal_receiver_timeout = 1h
+wal_sender_timeout = 1h
+```
+
+Additionally:
+
+- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed.
+- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general.
+
+And if you continue to get global lock timeouts during initial load,
+temporarily set `bdr.ddl_locking = off` for the initial load.
+
+### Prefer restoring to a single node
+
+Especially when initially setting up a cluster from a Postgres dump,
+we recommend you restore to a cluster with a single PGD node. Then run
+`pgd node setup` for each node you want in the cluster which will do a
+physical join that uses `bdr_init_physical` under the hood.
+
+### Sequences
 
 pg_dump dumps both local and global sequences as if
 they were local sequences. This behavior is intentional, to allow a PGD
@@ -51,7 +112,7 @@ dump only with `bdr.crdt_raw_value = on`.
 Technical Support recommends the use of physical backup techniques for
 backup and recovery of PGD.
 
-### Physical backup
+## Physical backup and restore
 
 You can take physical backups of a node in an EDB Postgres Distributed cluster using
 standard PostgreSQL software, such as
@@ -82,7 +143,92 @@ PostgreSQL backup techniques to PGD:
     local data and a backup of at least one node that subscribes to each
     replication set.
 
-### Eventual consistency
+### Restore
+
+While you can take a physical backup with the same procedure as a
+standard PostgreSQL node, it's slightly more complex to
+restore the physical backup of a PGD node.
+
+#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup
+
+The most common use case for restoring a physical backup involves the failure
+or replacement of all the PGD nodes in a cluster, for instance in the event of
+a data center failure.
+
+You might also want to perform this procedure to clone the current contents of a
+EDB Postgres Distributed cluster to seed a QA or development instance.
+
+In that case, you can restore PGD capabilities based on a physical backup
+of a single PGD node, optionally plus WAL archives:
+
+-   If you still have some PGD nodes live and running, fence off the host you
+    restored the PGD node to, so it can't connect to any surviving PGD nodes.
+    This practice ensures that the new node doesn't confuse the existing cluster.
+-   Restore a single PostgreSQL node from a physical backup of one of
+    the PGD nodes.
+-   If you have WAL archives associated with the backup, create a suitable
+    `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest
+    state. You can specify an alternative `recovery_target` here if needed.
+-   Start the restored node, or promote it to read/write if it was in standby
+    recovery. Keep it fenced from any surviving nodes!
+-   Clean up any leftover PGD metadata that was included in the physical backup.
+-   Fully stop and restart the PostgreSQL instance.
+-   Add further PGD nodes with the standard procedure based on the
+    `bdr.join_node_group()` function call.
+
+#### Cleanup of PGD metadata
+
+To clean up leftover PGD metadata:
+
+1.  Drop the PGD node using [`bdr.drop_node`](/pgd/latest/reference/tables-views-functions/functions-internal#bdrdrop_node).
+2.  Fully stop and restart PostgreSQL (important!).
+
+#### Cleanup of replication origins
+
+You must explicitly remove replication origins with a separate step
+because they're recorded persistently in a system catalog. They're
+therefore included in the backup and in the restored instance. They
+aren't removed automatically when dropping the BDR extension because
+they aren't explicitly recorded as its dependencies.
+
+To track progress of incoming replication in a crash-safe way, 
+PGD creates one replication origin for each remote master node. Therefore, 
+for each node in the previous cluster run this once:
+
+```
+SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename');
+```
+
+You can list replication origins as follows:
+
+```
+SELECT * FROM pg_replication_origin;
+```
+
+Those created by PGD are easily recognized by their name.
+
+#### Cleanup of replication slots
+
+If a physical backup was created with `pg_basebackup`, replication slots
+are omitted from the backup.
+
+Some other backup methods might preserve replications slots, likely in
+outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots:
+
+```
+SELECT pg_drop_replication_slot(slot_name)
+FROM pg_replication_slots;
+```
+
+If you have a reason to preserve some slots,
+you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely
+useful.
+
+!!! Warning
+    Never use these commands to drop replication slots on a live PGD node
+
+
+## Eventual consistency
 
 The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not
 *entirely consistent*. A physical backup of a given node provides
@@ -199,89 +345,4 @@ of changes arriving from a single master in COMMIT order.
 
 !!! Note
     This feature is available only with EDB Postgres Extended.
-    Barman doesn't create a `multi_recovery.conf` file.
-
-## Restore
-
-While you can take a physical backup with the same procedure as a
-standard PostgreSQL node, it's slightly more complex to
-restore the physical backup of a PGD node.
-
-### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup
-
-The most common use case for restoring a physical backup involves the failure
-or replacement of all the PGD nodes in a cluster, for instance in the event of
-a data center failure.
-
-You might also want to perform this procedure to clone the current contents of a
-EDB Postgres Distributed cluster to seed a QA or development instance.
-
-In that case, you can restore PGD capabilities based on a physical backup
-of a single PGD node, optionally plus WAL archives:
-
--   If you still have some PGD nodes live and running, fence off the host you
-    restored the PGD node to, so it can't connect to any surviving PGD nodes.
-    This practice ensures that the new node doesn't confuse the existing cluster.
--   Restore a single PostgreSQL node from a physical backup of one of
-    the PGD nodes.
--   If you have WAL archives associated with the backup, create a suitable
-    `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest
-    state. You can specify an alternative `recovery_target` here if needed.
--   Start the restored node, or promote it to read/write if it was in standby
-    recovery. Keep it fenced from any surviving nodes!
--   Clean up any leftover PGD metadata that was included in the physical backup.
--   Fully stop and restart the PostgreSQL instance.
--   Add further PGD nodes with the standard procedure based on the
-    `bdr.join_node_group()` function call.
-
-#### Cleanup of PGD metadata
-
-To clean up leftover PGD metadata:
-
-1.  Drop the PGD node using [`bdr.drop_node`](/pgd/latest/reference/tables-views-functions/functions-internal#bdrdrop_node).
-2.  Fully stop and restart PostgreSQL (important!).
-
-#### Cleanup of replication origins
-
-You must explicitly remove replication origins with a separate step
-because they're recorded persistently in a system catalog. They're
-therefore included in the backup and in the restored instance. They
-aren't removed automatically when dropping the BDR extension because
-they aren't explicitly recorded as its dependencies.
-
-To track progress of incoming replication in a crash-safe way, 
-PGD creates one replication origin for each remote master node. Therefore, 
-for each node in the previous cluster run this once:
-
-```
-SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename');
-```
-
-You can list replication origins as follows:
-
-```
-SELECT * FROM pg_replication_origin;
-```
-
-Those created by PGD are easily recognized by their name.
-
-#### Cleanup of replication slots
-
-If a physical backup was created with `pg_basebackup`, replication slots
-are omitted from the backup.
-
-Some other backup methods might preserve replications slots, likely in
-outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots:
-
-```
-SELECT pg_drop_replication_slot(slot_name)
-FROM pg_replication_slots;
-```
-
-If you have a reason to preserve some slots,
-you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely
-useful.
-
-!!! Warning
-    Never use these commands to drop replication slots on a live PGD node
-
+    Barman doesn't create a `multi_recovery.conf` file.