Skip to content

Conversation

datacore-senthil
Copy link

@datacore-senthil datacore-senthil commented Aug 28, 2025

zpool import: reconstruct corrupted primary GPT from backup

On Windows 2025, zpool import can fail if the primary EFI/GPT partition table has been corrupted by OS. In such cases, the
import path is updated to detect the corruption and fall back to the backup GPT.

The code now:

  • Reads GPT information from the backup when primary shows corruption.
  • Reconstructs the primary GPT with the correct partition style.
  • Rewrites the primary GPT on disk and proceeds with import.

This allows zpools with corrupted primary GPT labels to be imported successfully without data loss.

Jira's done : SSV-25260, SSV-25279

@datacore-senthil datacore-senthil marked this pull request as ready for review September 3, 2025 04:01
@arun-kv
Copy link

arun-kv commented Sep 3, 2025

@datacore-senthil as we discussed let's test with disk having data because with the increase in partition may reduce the usable space and may cause data loose.
Also, if we can simulate the OS upgrade by attaching same disk to two different vm(win22 and win25) then we can save the OS upgrade time and test all dev scenarios.

@datacore-senthil
Copy link
Author

@datacore-senthil as we discussed let's test with disk having data because with the increase in partition may reduce the usable space and may cause data loose. Also, if we can simulate the OS upgrade by attaching same disk to two different vm(win22 and win25) then we can save the OS upgrade time and test all dev scenarios.

Based on the suggestion, I did a manual validation without involving SSY. On a Windows Server 2022 VM, I created a ZPool and filled the pool to capacity. I then detached the virtual disk and re-attached it to another VM running Windows Server 2025. On that system, I was able to successfully import the pool and run a disktest check, which showed no errors. This confirms that the manual test passed.

However, during the code review we had a concern regarding how nblocks and the calculation of the last LBA behave when the partition count is increased from 9 to 128.
Nblocks formula,
image
From the analysis of the NBLOCKS(p, l) macro:
NBLOCKS(9, 512) → 4 blocks
NBLOCKS(128, 512) → 33 blocks
But due to the following condition in the code:
image

For 9 partitions:
Initial nblocks * lbsize = 4 * 512 = 2048 bytes

EFI_MIN_ARRAY_SIZE = 16 KB = 16384 bytes

Since 2048 < 16384, the condition is true → nblocks is recalculated:

nblocks = (16384 / 512) + 1 = 32 + 1 = 33
So finally nblocks will be 33 for 9 partition.

For 128 partitions , Direct calculation already gives nblocks = 33.

Conclusion:
Both cases (9 partitions and 128 partitions) result in the same effective nblocks = 33.
This ensures that the last usable LBA (efi_last_u_lba) remains consistent regardless of whether the partition table is configured for 9 entries or 128 entries. The minimum requirement of 33 blocks accommodates the GPT header and partition entries safely in both scenarios.

To confirm, debug logs were added and compared across Windows 2022 and Windows 2025 environments:
in windows 2022 ,

**[repair_vtoc] efi_last_u_lba : 41943006
**[repair_vtoc] efi_nparts : 9

in windows 2025,
**[repair_vtoc1] capacity (blocks): 41943040
**[repair_vtoc1] lbsize : 512
**[repair_vtoc] disk_last_lba : 41943039
**[repair_vtoc] nblocks (parttbl): 33
**[repair_vtoc] efi_last_u_lba : 41943006
**[repair_vtoc] efi_nparts : 128
These results confirm that despite increasing the partition count from 9 to 128, the effective efi_last_u_lba remains unchanged, validating that the nblocks calculation is consistent.

@datacore-senthil datacore-senthil merged commit 081749a into datacore-windows Sep 11, 2025
0 of 8 checks passed
datacore-senthil added a commit that referenced this pull request Sep 11, 2025
…tructed the primary (#104)

* Manually picked the changes from OpenZFS to import from the backup if the primary is corrupted by Windows2025

* SSV-25279 , during import rewriting the corrupted primary partition info with 128 partition to fix the add disk failure

* added log and handled error case

(cherry picked from commit 081749a)
datacore-senthil added a commit that referenced this pull request Sep 12, 2025
…tructed the primary (#104)

* Manually picked the changes from OpenZFS to import from the backup if the primary is corrupted by Windows2025

* SSV-25279 , during import rewriting the corrupted primary partition info with 128 partition to fix the add disk failure

* added log and handled error case

(cherry picked from commit 081749a)
datacore-senthil added a commit that referenced this pull request Sep 12, 2025
…tructed the primary (#104) (#109)

* Manually picked the changes from OpenZFS to import from the backup if the primary is corrupted by Windows2025

* SSV-25279 , during import rewriting the corrupted primary partition info with 128 partition to fix the add disk failure

* added log and handled error case

(cherry picked from commit 081749a)
datacore-senthil added a commit that referenced this pull request Sep 12, 2025
…tructed the primary (#104) (#108)

* Manually picked the changes from OpenZFS to import from the backup if the primary is corrupted by Windows2025

* SSV-25279 , during import rewriting the corrupted primary partition info with 128 partition to fix the add disk failure

* added log and handled error case

(cherry picked from commit 081749a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants