Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Nov 3, 2025

This PR introduces the new "builtin schema migration" mechanism. It fulfills the same purpose as the existing "builtin item migration" but avoids its known issues around concurrency and rollback-ability, as well has having a (hopefully) more precise name. See the design doc for details.

In contrast to the old mechanism, the new one explicitly defines a list of migrations to perform at specific versions, making it less likely that we'll accidentally migrate (and thereby truncate) builtin collections, and providing a way to choose a migration mechanism. The list is initially populated to support migrations from the last self-managed version, v0.147.

Motivation

  • This PR fixes a recognized bug.

Fixes https://github.com/MaterializeInc/database-issues/issues/9803
Fixes https://github.com/MaterializeInc/database-issues/issues/9755

  • This PR adds a known-desirable feature.

Closes https://github.com/MaterializeInc/database-issues/issues/9756

Tips for reviewer

Individual commits are meaningful, I recommend reviewing them separately.

I'm working on turmoil-based testing as well. Decided to keep that out of this PR though, since it's already large enough. Our existing CI around upgrade tests seems to be pretty good about finding issues that can occur with currently supported upgrade paths anyway. The turmoil tests will be more about exercising "exotic" scenarios, like multiple concurrent processes at the same version.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the builtin-schema-migration branch 11 times, most recently from 91418e0 to 172d4bb Compare November 6, 2025 10:58
@teskje teskje force-pushed the builtin-schema-migration branch 7 times, most recently from 432c103 to 44fa31b Compare November 10, 2025 12:43
Comment on lines 544 to 545
// TODO: We should check for deploy generation as well as build version. Storing the deploy
// generation as well requires changing the key schema of the migration shard.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we need this to make some of the CI pass (0dt tests that don't increase the version, only the deploy generation), so I ended up adding this in a later commit.

@teskje teskje marked this pull request as ready for review November 10, 2025 13:07
@teskje teskje requested review from a team and ggevay as code owners November 10, 2025 13:07
@teskje teskje requested a review from SangJunBak November 10, 2025 13:07
@teskje teskje force-pushed the builtin-schema-migration branch from 44fa31b to 3c67887 Compare November 11, 2025 12:44
Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persist changes look fine to me!

///
/// Migration steps for old versions must be retained around according to the upgrade policy. For
/// example, if we support upgrading one major version at a time, the release of version `N.0.0`
/// can delete all migration steps with versions before `(N-1).0.0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨!

A fun and optional thing to do might be to add a unit test that checks the current build metadata and fails if any of the migrations are "too old", so we have a cleanup reminder...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! In fact, all of the validate_migration_steps checks can be moved into a unit test, to remove some clutter from the code.

Noting this as a follow-up.

// migrations if we observe this build info.
if *build_info == DUMMY_BUILD_INFO {
return Ok(MigrationResult::default());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel like the dummy build info has become more painful as we hardcode more versions into the codebase... there was some similarly awkward working-around in Persist.

In this case it might be possible to make this code work with a stale version here, by eg. filtering the list of migrations passed to migration.run. But it's possible in the long run we just want to get rid of the dummy build info entirely...

Copy link
Contributor Author

@teskje teskje Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my assumption here is that it's fine for tests that don't care about the specific version to not get migrations. If you want to test the migrations, you also have to be explicit about versions. Filtering the migrations has the same effect (there are no migrations before version 0.0.0-dummy) but prevents us from having the sanity check that looks for migrations at versions larger than the current one. Though if we move the sanity checks into a unit test, I think we can do the filtering and remove the special case for DUMMY_BUILD_INFO.

And I'm also not sure why DUMMY_BUILD_INFO is needed in the first place. Seems easy to use the actual build info instead. Except perhaps for golden tests that compare to a static expected output.

@SangJunBak
Copy link
Contributor

SangJunBak commented Nov 11, 2025

Noticed a panic locally as well as in this nightly since you rebased onto main at nov 11 7:44am est https://buildkite.com/materialize/nightly/builds/14059#019a751b-6ab2-43d5-8638-45b93f040912. I think it has to do with Ben's changes.

Can repro it easily locally via bin/mzcompose --find platform-checks run default --scenario=UpgradeEntireMzFromLatestSelfManaged --check=MySqlCdcNoWait. Stack trace:

2025-11-11T22:42:34.646015Z  thread 'main' panicked at src/adapter/src/catalog/open/builtin_schema_migration.rs:442:14:
valid usage: CodecMismatch(CodecMismatch { requested: ("()", "()", "u64", "i64", None), actual: ("TableKey", "ShardId", "u64", "i64", None) })
   6: core::panicking::panic_fmt
   7: core::result::unwrap_failed
   8: <mz_adapter::catalog::open::builtin_schema_migration::Migration>::run::{closure#0}
   9: <mz_adapter::catalog::Catalog>::initialize_state::{closure#0}

@def-
Copy link
Contributor

def- commented Nov 11, 2025

Great to see the tests working :)

Copy link
Contributor

@SangJunBak SangJunBak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions!

Comment on lines +842 to +855
let (mut persist_write, mut persist_read) =
self.open_migration_shard(diagnostics.clone()).await;
let mut persist_since = self.open_migration_shard_since(diagnostics.clone()).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticing that we're not committing an empty update to ensure the migration shard is readable like before. Why can we avoid doing that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We simply skip reading the migration shard if it's not readable. In these cases the upper will be [0], so write_ts.step_back() (what we do to get a read timestamp) returns None. And in the two places where we read the migration shard, we always handle the None value appropriately.

This seemed a bit simpler to me than doing the empty append. Note that with the empty append we'd still need to handle the None value by unwrapping it, so then we'd have to reason about why this can really never be None.

Comment on lines +673 to +688
// Generate new shard IDs and attempt to insert them into the migration shard. If we get a
// CaA failure at `write_ts` that means a concurrent process has inserted in the meantime
// and we need to re-check the migration shard contents.
let mut replaced_shards = BTreeMap::new();
let mut updates = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, let's assume we tried a migration for (v26.0.0 -> 27.0.0), but the upgrade fails at read-only after we already wrote to the migration shard. So now the migration shard has this entry:
write_ts, (mz_objects_id,mz_objects_shard_id), 1.
Now when we retry the migration in readonly mode and we succeed. Now we'll have:

write_ts, (mz_objects_id,mz_objects_shard_id), 1
write_ts, (mz_objects_id,mz_objects_shard_id_2), 1

in the migration shard. Questions/clarifications:

  • How do we know which shard id to use if we have duplicate entries at the same timestamp?
  • Does shard creation occur as soon as we write compare_and_append?
  • Do we clean / finalize these shards on the next upgrade given:
 // Collect old entries to remove.
let pred = |key: &migration_shard::Key| key.build_version < self.source_version;

Copy link
Contributor Author

@teskje teskje Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now when we retry the migration in readonly mode and we succeed. Now we'll have:

write_ts, (mz_objects_id,mz_objects_shard_id), 1
write_ts, (mz_objects_id,mz_objects_shard_id_2), 1

in the migration shard.

That's not right! We never commit to the migration shard twice for the same version. Notice how before we generate the new shard IDs and append them we read the contents of the migration shard and check if there are any entries at our version. If there are, we use these and do not generate new ones.

So in your example, when we retry the migration in read-only mode, the contents of the shard would still be:

write_ts, (mz_objects_id,mz_objects_shard_id), 1

Now to answer your questions:

How do we know which shard id to use if we have duplicate entries at the same timestamp?

We should never have duplicate entries at the same version (and deploy generation)!

Does shard creation occur as soon as we write compare_and_append?

No, writing a ShardId to the migration shard does not create that shard in the persist state. This happens later during adapter bootstrap where the storage controller gets the ShardIds from the collection metadata in the catalog (what we update by calling insert_collection_metadata here) and then opens the shards.

Do we clean / finalize these shards on the next upgrade given:

On the next upgrade to a higher version, we'll delete the migration shard entries from lower versions, yes. Whether we also finalize the shard depends on whether the shard is in use or not. To find that out we look at the collection metadata:

            // The migration shard contains both shards created during aborted upgrades and shards
            // created during successful upgrades. The latter may still be in use, so we have to
            // check and only finalize those that aren't anymore.
            let gid = GlobalId::System(key.global_id);
            if collection_metadata.get(&gid) != Some(&shard_id) {
                unfinalized_shards.insert(shard_id);
            }

Copy link
Contributor

@SangJunBak SangJunBak Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice how before we generate the new shard IDs and append them we read the contents of the migration shard and check if there are any entries at our version. If there are, we use these and do not generate new ones.

Ah you're right! Sorry I misread that as this only applying for the new leader. i.e. thought it was self.build_version == self.target_version instead of key.build_version == self.target_version.

Thanks for the clarifications however. This function makes sense to me!

@teskje teskje force-pushed the builtin-schema-migration branch 2 times, most recently from ff8677f to 49580a5 Compare November 12, 2025 09:24
This commit exposes the existing `register_schema` API from
`PersistClient`. This provides a way for users to register the initial
schema of a shard without performing a write.

Ideally, we'd remove the `register_schema` API and instead make
`compare_and_evolve_schema` also handle the case where no schema was
previously registered. Leaving that to future work.
@teskje teskje force-pushed the builtin-schema-migration branch from 49580a5 to bf952e8 Compare November 12, 2025 09:36
@teskje
Copy link
Contributor Author

teskje commented Nov 12, 2025

Noticed a panic locally as well as in this nightly since you rebased onto main

Thanks! Had nothing to do with Ben's changes. I added an additional commit to upgrade the migration shard version, and I specified the wrong schema in that upgrade call. Fixed!

This commit introduces the new "builtin schema migration" mechanism. It
fulfills the same purpose as the existing "builtin item migration" but
avoids its known issues around concurrency and rollback-ability, as well
has having a (hopefully) more precise name.

In contrast to the old mechanism, the new one explicitly defines a list
of migrations to perform at specific versions, making it less likely
that we'll accidentally migrate (and thereby truncate) builtin
collections, and providing a way to choose a migration mechanism.

This commit only adds the new implementation. Subsequent commits deal
with wiring it up, populating the `MIGRATIONS` list, and deleting the
old implementation.
This commit makes the catalog initialization code start using the
builtin schema migration mechanism.

Notable changes compared to the old mechanism:

 * Because migrations are specified explicitly, there is no need anymore
   for `add_new_remove_old_builtin_items_migration` to collect the list
   of changed builtins based on fingerprint mismatches. Instead, the
   schema migrations simply execute the specified migration steps and
   afterwards assert that the fingerprints of non-migrated items match.
 * Catalog state updates are now applied outside the migration code.
This commit adds the initial list of builtin schema migrations required
to upgrade from the last self-managed release version, v0.147.
This commit adds support for forcing builtin schema migrations during
tests, through a --force-builtin-schema-migration envd flag, which gets
wired through to the builtin schema migration code. If the flag is set,
all builtin collections are force-migrated using the selected mechanism.

The existing mechanism to force builtin item migrations worked by
modifying the fingerprints of builtins. This doesn't work anymore
because fingerprint mismatches are no longer used to inform which
migrations to run, only to verify all items have been correctly
migrated. Thus, this commit also removes support for adding whitespace
to builtin fingerprints.
This commit adds the deploy generation to the `Key` used in the
migration shard. This is necessary because some of our tests simulate
0dt upgrades by bumping the deploy generation without actually bumping
the version, and those tests get confused when the deploy generation
isn't in the key.

Note that this requires changing the serialization format of the
migration shard key. We choose JSON, to make the format easier to modify
in the future. We still need to be able to parse the older format so we
can clean up old entries.
@teskje teskje force-pushed the builtin-schema-migration branch from bf952e8 to 7ed20da Compare November 12, 2025 10:32
Copy link
Contributor

@SangJunBak SangJunBak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@teskje
Copy link
Contributor Author

teskje commented Nov 12, 2025

The failing nightlies are because of DockerHub issues. Optimistically merging since all the nightlies did pass previously.

@teskje teskje merged commit 8fb0559 into MaterializeInc:main Nov 12, 2025
324 of 332 checks passed
@teskje
Copy link
Contributor Author

teskje commented Nov 12, 2025

TFTRs!

@teskje teskje deleted the builtin-schema-migration branch November 12, 2025 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants