Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Nov 6, 2025

Prior to this change, if the leader environment restarted either during an in-progress 0dt upgrade or after an aborted one, it would fail to come up if that 0dt upgrade caused builtin item migrations to occur. The new version would poison the migration shard, and the old version would halt itself upon reading it.

This PR applies the minimal fix: When performing builtin item migrations in leader mode, we now ignore the existence of newer versions in the migration shard, instead of crashing upon observing them. Read-only processes still crash upon observing newer versions.

Additionally, this PR also introduces graceful skipping of undecodable entries in the migration shard. This lets a future version change the format of these entries without interfering with older versions that don't understand the new format.

This change is meant to derisk subsequent releases before the rewrite of the builtin item migrations lands. In contrast to that rewrite, this change is small enough to be backported into previous versions, so we can deploy it prior to the next release that requires migrations to occur.

Motivation

  • This PR fixes a recognized bug.

Stopgap for https://github.com/MaterializeInc/database-issues/issues/9755

Tips for reviewer

See discussion in Slack.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the builtin-item-migration-ignore-new-versions branch from 3263961 to 8e7f277 Compare November 6, 2025 13:22
@teskje teskje marked this pull request as ready for review November 6, 2025 13:23
@teskje teskje requested a review from a team as a code owner November 6, 2025 13:23
@teskje teskje requested review from SangJunBak and aljoscha November 6, 2025 13:23
@teskje teskje added the self-managed-backport-v25.2 Needs to be backported into the v25.2 self-managed release label Nov 6, 2025
Prior to this change, if the leader environment restarted either during
an in-progress 0dt upgrade or after an aborted one, it would fail to
come up if that 0dt upgrade caused builtin item migrations to occur. The
new version would poison the migration shard, and the old version would
halt itself upon reading it.

This commit applies the minimal fix: When performing builtin item
migrations in leader mode, we now ignore the existence of newer versions
in the migration shard, instead of crashing upon observing them.
Read-only processes still crash upon observing newer versions.

Additionally, this commit also introduces graceful skipping of
undecodable entries in the migration shard. This lets a future version
change the format of these entries without interfering with older
versions that don't understand the new format.

This change is meant to derisk subsequent releases before the rewrite of
the builtin item migrations lands. In contrast to that rewrite, this
change is small enough to be backported into previous versions, so we
can deploy it prior to the next release that requires migrations to
occur.
@teskje teskje force-pushed the builtin-item-migration-ignore-new-versions branch from 8e7f277 to aefcac1 Compare November 6, 2025 14:12
Copy link
Contributor

@SangJunBak SangJunBak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me!

@teskje
Copy link
Contributor Author

teskje commented Nov 6, 2025

TFTR!

@teskje teskje merged commit e461509 into MaterializeInc:main Nov 6, 2025
130 checks passed
@teskje teskje deleted the builtin-item-migration-ignore-new-versions branch November 6, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

self-managed-backport-v25.2-done self-managed-backport-v25.2 Needs to be backported into the v25.2 self-managed release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants