catalog: builtin schema migration #34011

teskje · 2025-11-03T13:28:24Z

This PR introduces the new "builtin schema migration" mechanism. It fulfills the same purpose as the existing "builtin item migration" but avoids its known issues around concurrency and rollback-ability, as well has having a (hopefully) more precise name. See the design doc for details.

In contrast to the old mechanism, the new one explicitly defines a list of migrations to perform at specific versions, making it less likely that we'll accidentally migrate (and thereby truncate) builtin collections, and providing a way to choose a migration mechanism. The list is initially populated to support migrations from the last self-managed version, v0.147.

Motivation

This PR fixes a recognized bug.

Fixes https://github.com/MaterializeInc/database-issues/issues/9803
Fixes https://github.com/MaterializeInc/database-issues/issues/9755

This PR adds a known-desirable feature.

Closes https://github.com/MaterializeInc/database-issues/issues/9756

Tips for reviewer

Individual commits are meaningful, I recommend reviewing them separately.

I'm working on turmoil-based testing as well. Decided to keep that out of this PR though, since it's already large enough. Our existing CI around upgrade tests seems to be pretty good about finding issues that can occur with currently supported upgrade paths anyway. The turmoil tests will be more about exercising "exotic" scenarios, like multiple concurrent processes at the same version.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

teskje · 2025-11-10T12:49:03Z

src/adapter/src/catalog/open/builtin_schema_migration.rs

+        // TODO: We should check for deploy generation as well as build version. Storing the deploy
+        // generation as well requires changing the key schema of the migration shard.


Turns out we need this to make some of the CI pass (0dt tests that don't increase the version, only the deploy generation), so I ended up adding this in a later commit.

bkirwi

Persist changes look fine to me!

bkirwi · 2025-11-11T18:40:07Z

src/adapter/src/catalog/open/builtin_schema_migration.rs

+///
+/// Migration steps for old versions must be retained around according to the upgrade policy. For
+/// example, if we support upgrading one major version at a time, the release of version `N.0.0`
+/// can delete all migration steps with versions before `(N-1).0.0`.


✨!

A fun and optional thing to do might be to add a unit test that checks the current build metadata and fails if any of the migrations are "too old", so we have a cleanup reminder...

Good idea! In fact, all of the validate_migration_steps checks can be moved into a unit test, to remove some clutter from the code.

Noting this as a follow-up.

bkirwi · 2025-11-11T18:48:58Z

src/adapter/src/catalog/open/builtin_schema_migration.rs

+    // migrations if we observe this build info.
+    if *build_info == DUMMY_BUILD_INFO {
+        return Ok(MigrationResult::default());
+    }


I do feel like the dummy build info has become more painful as we hardcode more versions into the codebase... there was some similarly awkward working-around in Persist.

In this case it might be possible to make this code work with a stale version here, by eg. filtering the list of migrations passed to migration.run. But it's possible in the long run we just want to get rid of the dummy build info entirely...

Yeah, my assumption here is that it's fine for tests that don't care about the specific version to not get migrations. If you want to test the migrations, you also have to be explicit about versions. Filtering the migrations has the same effect (there are no migrations before version 0.0.0-dummy) but prevents us from having the sanity check that looks for migrations at versions larger than the current one. Though if we move the sanity checks into a unit test, I think we can do the filtering and remove the special case for DUMMY_BUILD_INFO.

And I'm also not sure why DUMMY_BUILD_INFO is needed in the first place. Seems easy to use the actual build info instead. Except perhaps for golden tests that compare to a static expected output.

SangJunBak · 2025-11-11T22:56:09Z

Noticed a panic locally as well as in this nightly since you rebased onto main at nov 11 7:44am est https://buildkite.com/materialize/nightly/builds/14059#019a751b-6ab2-43d5-8638-45b93f040912. I think it has to do with Ben's changes.

Can repro it easily locally via bin/mzcompose --find platform-checks run default --scenario=UpgradeEntireMzFromLatestSelfManaged --check=MySqlCdcNoWait. Stack trace:

2025-11-11T22:42:34.646015Z  thread 'main' panicked at src/adapter/src/catalog/open/builtin_schema_migration.rs:442:14:
valid usage: CodecMismatch(CodecMismatch { requested: ("()", "()", "u64", "i64", None), actual: ("TableKey", "ShardId", "u64", "i64", None) })
   6: core::panicking::panic_fmt
   7: core::result::unwrap_failed
   8: <mz_adapter::catalog::open::builtin_schema_migration::Migration>::run::{closure#0}
   9: <mz_adapter::catalog::Catalog>::initialize_state::{closure#0}

def- · 2025-11-11T23:01:42Z

Great to see the tests working :)

SangJunBak

Just some questions!

src/adapter/src/catalog/open/builtin_schema_migration.rs

SangJunBak · 2025-11-12T07:17:31Z

src/adapter/src/catalog/open/builtin_schema_migration.rs

+        let (mut persist_write, mut persist_read) =
+            self.open_migration_shard(diagnostics.clone()).await;
+        let mut persist_since = self.open_migration_shard_since(diagnostics.clone()).await;


Noticing that we're not committing an empty update to ensure the migration shard is readable like before. Why can we avoid doing that here?

We simply skip reading the migration shard if it's not readable. In these cases the upper will be [0], so write_ts.step_back() (what we do to get a read timestamp) returns None. And in the two places where we read the migration shard, we always handle the None value appropriately.

This seemed a bit simpler to me than doing the empty append. Note that with the empty append we'd still need to handle the None value by unwrapping it, so then we'd have to reason about why this can really never be None.

src/adapter/src/catalog/open/builtin_schema_migration.rs

SangJunBak · 2025-11-12T08:01:14Z

src/adapter/src/catalog/open/builtin_schema_migration.rs

+        // Generate new shard IDs and attempt to insert them into the migration shard. If we get a
+        // CaA failure at `write_ts` that means a concurrent process has inserted in the meantime
+        // and we need to re-check the migration shard contents.
+        let mut replaced_shards = BTreeMap::new();
+        let mut updates = Vec::new();


If I'm understanding this correctly, let's assume we tried a migration for (v26.0.0 -> 27.0.0), but the upgrade fails at read-only after we already wrote to the migration shard. So now the migration shard has this entry:
write_ts, (mz_objects_id,mz_objects_shard_id), 1.
Now when we retry the migration in readonly mode and we succeed. Now we'll have:

write_ts, (mz_objects_id,mz_objects_shard_id), 1 write_ts, (mz_objects_id,mz_objects_shard_id_2), 1

in the migration shard. Questions/clarifications:

How do we know which shard id to use if we have duplicate entries at the same timestamp?

Does shard creation occur as soon as we write compare_and_append?

Do we clean / finalize these shards on the next upgrade given:

// Collect old entries to remove. let pred = |key: &migration_shard::Key| key.build_version < self.source_version;

Now when we retry the migration in readonly mode and we succeed. Now we'll have:

write_ts, (mz_objects_id,mz_objects_shard_id), 1 write_ts, (mz_objects_id,mz_objects_shard_id_2), 1

in the migration shard.

That's not right! We never commit to the migration shard twice for the same version. Notice how before we generate the new shard IDs and append them we read the contents of the migration shard and check if there are any entries at our version. If there are, we use these and do not generate new ones.

So in your example, when we retry the migration in read-only mode, the contents of the shard would still be:

write_ts, (mz_objects_id,mz_objects_shard_id), 1

Now to answer your questions:

How do we know which shard id to use if we have duplicate entries at the same timestamp?

We should never have duplicate entries at the same version (and deploy generation)!

Does shard creation occur as soon as we write compare_and_append?

No, writing a ShardId to the migration shard does not create that shard in the persist state. This happens later during adapter bootstrap where the storage controller gets the ShardIds from the collection metadata in the catalog (what we update by calling insert_collection_metadata here) and then opens the shards.

Do we clean / finalize these shards on the next upgrade given:

On the next upgrade to a higher version, we'll delete the migration shard entries from lower versions, yes. Whether we also finalize the shard depends on whether the shard is in use or not. To find that out we look at the collection metadata:

// The migration shard contains both shards created during aborted upgrades and shards // created during successful upgrades. The latter may still be in use, so we have to // check and only finalize those that aren't anymore. let gid = GlobalId::System(key.global_id); if collection_metadata.get(&gid) != Some(&shard_id) { unfinalized_shards.insert(shard_id); }

Notice how before we generate the new shard IDs and append them we read the contents of the migration shard and check if there are any entries at our version. If there are, we use these and do not generate new ones.

Ah you're right! Sorry I misread that as this only applying for the new leader. i.e. thought it was self.build_version == self.target_version instead of key.build_version == self.target_version.

Thanks for the clarifications however. This function makes sense to me!

This commit exposes the existing `register_schema` API from `PersistClient`. This provides a way for users to register the initial schema of a shard without performing a write. Ideally, we'd remove the `register_schema` API and instead make `compare_and_evolve_schema` also handle the case where no schema was previously registered. Leaving that to future work.

teskje · 2025-11-12T09:37:31Z

Noticed a panic locally as well as in this nightly since you rebased onto main

Thanks! Had nothing to do with Ben's changes. I added an additional commit to upgrade the migration shard version, and I specified the wrong schema in that upgrade call. Fixed!

This commit introduces the new "builtin schema migration" mechanism. It fulfills the same purpose as the existing "builtin item migration" but avoids its known issues around concurrency and rollback-ability, as well has having a (hopefully) more precise name. In contrast to the old mechanism, the new one explicitly defines a list of migrations to perform at specific versions, making it less likely that we'll accidentally migrate (and thereby truncate) builtin collections, and providing a way to choose a migration mechanism. This commit only adds the new implementation. Subsequent commits deal with wiring it up, populating the `MIGRATIONS` list, and deleting the old implementation.

This commit makes the catalog initialization code start using the builtin schema migration mechanism. Notable changes compared to the old mechanism: * Because migrations are specified explicitly, there is no need anymore for `add_new_remove_old_builtin_items_migration` to collect the list of changed builtins based on fingerprint mismatches. Instead, the schema migrations simply execute the specified migration steps and afterwards assert that the fingerprints of non-migrated items match. * Catalog state updates are now applied outside the migration code.

This commit adds the initial list of builtin schema migrations required to upgrade from the last self-managed release version, v0.147.

This commit adds support for forcing builtin schema migrations during tests, through a --force-builtin-schema-migration envd flag, which gets wired through to the builtin schema migration code. If the flag is set, all builtin collections are force-migrated using the selected mechanism. The existing mechanism to force builtin item migrations worked by modifying the fingerprints of builtins. This doesn't work anymore because fingerprint mismatches are no longer used to inform which migrations to run, only to verify all items have been correctly migrated. Thus, this commit also removes support for adding whitespace to builtin fingerprints.

This commit adds the deploy generation to the `Key` used in the migration shard. This is necessary because some of our tests simulate 0dt upgrades by bumping the deploy generation without actually bumping the version, and those tests get confused when the deploy generation isn't in the key. Note that this requires changing the serialization format of the migration shard key. We choose JSON, to make the format easier to modify in the future. We still need to be able to parse the older format so we can clean up old entries.

SangJunBak

LGTM!

teskje · 2025-11-12T16:12:07Z

The failing nightlies are because of DockerHub issues. Optimistically merging since all the nightlies did pass previously.

teskje · 2025-11-12T16:12:22Z

TFTRs!

teskje force-pushed the builtin-schema-migration branch 11 times, most recently from 91418e0 to 172d4bb Compare November 6, 2025 10:58

teskje mentioned this pull request Nov 6, 2025

adapter: gracefully handle newer versions in migration shard #34037

Merged

5 tasks

teskje force-pushed the builtin-schema-migration branch 7 times, most recently from 432c103 to 44fa31b Compare November 10, 2025 12:43

teskje commented Nov 10, 2025

View reviewed changes

teskje marked this pull request as ready for review November 10, 2025 13:07

teskje requested review from a team and ggevay as code owners November 10, 2025 13:07

teskje requested a review from SangJunBak November 10, 2025 13:07

teskje mentioned this pull request Nov 11, 2025

[persist] Actually perform upgrade for all shards #34075

Merged

teskje force-pushed the builtin-schema-migration branch from 44fa31b to 3c67887 Compare November 11, 2025 12:44

bkirwi reviewed Nov 11, 2025

View reviewed changes

SangJunBak reviewed Nov 12, 2025

View reviewed changes

teskje force-pushed the builtin-schema-migration branch 2 times, most recently from ff8677f to 49580a5 Compare November 12, 2025 09:24

teskje force-pushed the builtin-schema-migration branch from 49580a5 to bf952e8 Compare November 12, 2025 09:36

teskje added 8 commits November 12, 2025 11:32

adapter: remove old builtin item migration code

e07dcf8

adapter: add initial builtin schema migrations

fe43ae0

This commit adds the initial list of builtin schema migrations required to upgrade from the last self-managed release version, v0.147.

adapter: gracefully handle dummy build info in schema migrations

061c69c

adapter: upgrade migration shard in leader mode

7ed20da

teskje force-pushed the builtin-schema-migration branch from bf952e8 to 7ed20da Compare November 12, 2025 10:32

SangJunBak approved these changes Nov 12, 2025

View reviewed changes

teskje merged commit 8fb0559 into MaterializeInc:main Nov 12, 2025
324 of 332 checks passed

teskje deleted the builtin-schema-migration branch November 12, 2025 16:12

		// TODO: We should check for deploy generation as well as build version. Storing the deploy
		// generation as well requires changing the key schema of the migration shard.

catalog: builtin schema migration #34011

catalog: builtin schema migration #34011

Uh oh!

Conversation

teskje commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tips for reviewer

Checklist

Uh oh!

teskje Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

bkirwi left a comment

Choose a reason for hiding this comment

Uh oh!

bkirwi Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

bkirwi Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SangJunBak commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

def- commented Nov 11, 2025

Uh oh!

SangJunBak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SangJunBak Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SangJunBak Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

teskje Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SangJunBak Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

teskje commented Nov 12, 2025

Uh oh!

SangJunBak left a comment

Choose a reason for hiding this comment

Uh oh!

teskje commented Nov 12, 2025

Uh oh!

Uh oh!

teskje commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

teskje commented Nov 3, 2025 •

edited

Loading

teskje Nov 12, 2025 •

edited

Loading

SangJunBak commented Nov 11, 2025 •

edited

Loading

teskje Nov 12, 2025 •

edited

Loading

SangJunBak Nov 12, 2025 •

edited

Loading