-
Notifications
You must be signed in to change notification settings - Fork 482
catalog: builtin schema migration #34011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
91418e0 to
172d4bb
Compare
432c103 to
44fa31b
Compare
| // TODO: We should check for deploy generation as well as build version. Storing the deploy | ||
| // generation as well requires changing the key schema of the migration shard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out we need this to make some of the CI pass (0dt tests that don't increase the version, only the deploy generation), so I ended up adding this in a later commit.
44fa31b to
3c67887
Compare
bkirwi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persist changes look fine to me!
| /// | ||
| /// Migration steps for old versions must be retained around according to the upgrade policy. For | ||
| /// example, if we support upgrading one major version at a time, the release of version `N.0.0` | ||
| /// can delete all migration steps with versions before `(N-1).0.0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨!
A fun and optional thing to do might be to add a unit test that checks the current build metadata and fails if any of the migrations are "too old", so we have a cleanup reminder...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! In fact, all of the validate_migration_steps checks can be moved into a unit test, to remove some clutter from the code.
Noting this as a follow-up.
| // migrations if we observe this build info. | ||
| if *build_info == DUMMY_BUILD_INFO { | ||
| return Ok(MigrationResult::default()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do feel like the dummy build info has become more painful as we hardcode more versions into the codebase... there was some similarly awkward working-around in Persist.
In this case it might be possible to make this code work with a stale version here, by eg. filtering the list of migrations passed to migration.run. But it's possible in the long run we just want to get rid of the dummy build info entirely...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my assumption here is that it's fine for tests that don't care about the specific version to not get migrations. If you want to test the migrations, you also have to be explicit about versions. Filtering the migrations has the same effect (there are no migrations before version 0.0.0-dummy) but prevents us from having the sanity check that looks for migrations at versions larger than the current one. Though if we move the sanity checks into a unit test, I think we can do the filtering and remove the special case for DUMMY_BUILD_INFO.
And I'm also not sure why DUMMY_BUILD_INFO is needed in the first place. Seems easy to use the actual build info instead. Except perhaps for golden tests that compare to a static expected output.
|
Noticed a panic locally as well as in this nightly since you rebased onto main at nov 11 7:44am est https://buildkite.com/materialize/nightly/builds/14059#019a751b-6ab2-43d5-8638-45b93f040912. I think it has to do with Ben's changes. Can repro it easily locally via |
|
Great to see the tests working :) |
SangJunBak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions!
| let (mut persist_write, mut persist_read) = | ||
| self.open_migration_shard(diagnostics.clone()).await; | ||
| let mut persist_since = self.open_migration_shard_since(diagnostics.clone()).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticing that we're not committing an empty update to ensure the migration shard is readable like before. Why can we avoid doing that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We simply skip reading the migration shard if it's not readable. In these cases the upper will be [0], so write_ts.step_back() (what we do to get a read timestamp) returns None. And in the two places where we read the migration shard, we always handle the None value appropriately.
This seemed a bit simpler to me than doing the empty append. Note that with the empty append we'd still need to handle the None value by unwrapping it, so then we'd have to reason about why this can really never be None.
| // Generate new shard IDs and attempt to insert them into the migration shard. If we get a | ||
| // CaA failure at `write_ts` that means a concurrent process has inserted in the meantime | ||
| // and we need to re-check the migration shard contents. | ||
| let mut replaced_shards = BTreeMap::new(); | ||
| let mut updates = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding this correctly, let's assume we tried a migration for (v26.0.0 -> 27.0.0), but the upgrade fails at read-only after we already wrote to the migration shard. So now the migration shard has this entry:
write_ts, (mz_objects_id,mz_objects_shard_id), 1.
Now when we retry the migration in readonly mode and we succeed. Now we'll have:
write_ts, (mz_objects_id,mz_objects_shard_id), 1
write_ts, (mz_objects_id,mz_objects_shard_id_2), 1
in the migration shard. Questions/clarifications:
- How do we know which shard id to use if we have duplicate entries at the same timestamp?
- Does shard creation occur as soon as we write
compare_and_append? - Do we clean / finalize these shards on the next upgrade given:
// Collect old entries to remove.
let pred = |key: &migration_shard::Key| key.build_version < self.source_version;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now when we retry the migration in readonly mode and we succeed. Now we'll have:
write_ts, (mz_objects_id,mz_objects_shard_id), 1 write_ts, (mz_objects_id,mz_objects_shard_id_2), 1in the migration shard.
That's not right! We never commit to the migration shard twice for the same version. Notice how before we generate the new shard IDs and append them we read the contents of the migration shard and check if there are any entries at our version. If there are, we use these and do not generate new ones.
So in your example, when we retry the migration in read-only mode, the contents of the shard would still be:
write_ts, (mz_objects_id,mz_objects_shard_id), 1
Now to answer your questions:
How do we know which shard id to use if we have duplicate entries at the same timestamp?
We should never have duplicate entries at the same version (and deploy generation)!
Does shard creation occur as soon as we write
compare_and_append?
No, writing a ShardId to the migration shard does not create that shard in the persist state. This happens later during adapter bootstrap where the storage controller gets the ShardIds from the collection metadata in the catalog (what we update by calling insert_collection_metadata here) and then opens the shards.
Do we clean / finalize these shards on the next upgrade given:
On the next upgrade to a higher version, we'll delete the migration shard entries from lower versions, yes. Whether we also finalize the shard depends on whether the shard is in use or not. To find that out we look at the collection metadata:
// The migration shard contains both shards created during aborted upgrades and shards
// created during successful upgrades. The latter may still be in use, so we have to
// check and only finalize those that aren't anymore.
let gid = GlobalId::System(key.global_id);
if collection_metadata.get(&gid) != Some(&shard_id) {
unfinalized_shards.insert(shard_id);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice how before we generate the new shard IDs and append them we read the contents of the migration shard and check if there are any entries at our version. If there are, we use these and do not generate new ones.
Ah you're right! Sorry I misread that as this only applying for the new leader. i.e. thought it was self.build_version == self.target_version instead of key.build_version == self.target_version.
Thanks for the clarifications however. This function makes sense to me!
ff8677f to
49580a5
Compare
This commit exposes the existing `register_schema` API from `PersistClient`. This provides a way for users to register the initial schema of a shard without performing a write. Ideally, we'd remove the `register_schema` API and instead make `compare_and_evolve_schema` also handle the case where no schema was previously registered. Leaving that to future work.
49580a5 to
bf952e8
Compare
Thanks! Had nothing to do with Ben's changes. I added an additional commit to upgrade the migration shard version, and I specified the wrong schema in that upgrade call. Fixed! |
This commit introduces the new "builtin schema migration" mechanism. It fulfills the same purpose as the existing "builtin item migration" but avoids its known issues around concurrency and rollback-ability, as well has having a (hopefully) more precise name. In contrast to the old mechanism, the new one explicitly defines a list of migrations to perform at specific versions, making it less likely that we'll accidentally migrate (and thereby truncate) builtin collections, and providing a way to choose a migration mechanism. This commit only adds the new implementation. Subsequent commits deal with wiring it up, populating the `MIGRATIONS` list, and deleting the old implementation.
This commit makes the catalog initialization code start using the builtin schema migration mechanism. Notable changes compared to the old mechanism: * Because migrations are specified explicitly, there is no need anymore for `add_new_remove_old_builtin_items_migration` to collect the list of changed builtins based on fingerprint mismatches. Instead, the schema migrations simply execute the specified migration steps and afterwards assert that the fingerprints of non-migrated items match. * Catalog state updates are now applied outside the migration code.
This commit adds the initial list of builtin schema migrations required to upgrade from the last self-managed release version, v0.147.
This commit adds support for forcing builtin schema migrations during tests, through a --force-builtin-schema-migration envd flag, which gets wired through to the builtin schema migration code. If the flag is set, all builtin collections are force-migrated using the selected mechanism. The existing mechanism to force builtin item migrations worked by modifying the fingerprints of builtins. This doesn't work anymore because fingerprint mismatches are no longer used to inform which migrations to run, only to verify all items have been correctly migrated. Thus, this commit also removes support for adding whitespace to builtin fingerprints.
This commit adds the deploy generation to the `Key` used in the migration shard. This is necessary because some of our tests simulate 0dt upgrades by bumping the deploy generation without actually bumping the version, and those tests get confused when the deploy generation isn't in the key. Note that this requires changing the serialization format of the migration shard key. We choose JSON, to make the format easier to modify in the future. We still need to be able to parse the older format so we can clean up old entries.
bf952e8 to
7ed20da
Compare
SangJunBak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
The failing nightlies are because of DockerHub issues. Optimistically merging since all the nightlies did pass previously. |
|
TFTRs! |
This PR introduces the new "builtin schema migration" mechanism. It fulfills the same purpose as the existing "builtin item migration" but avoids its known issues around concurrency and rollback-ability, as well has having a (hopefully) more precise name. See the design doc for details.
In contrast to the old mechanism, the new one explicitly defines a list of migrations to perform at specific versions, making it less likely that we'll accidentally migrate (and thereby truncate) builtin collections, and providing a way to choose a migration mechanism. The list is initially populated to support migrations from the last self-managed version, v0.147.
Motivation
Fixes https://github.com/MaterializeInc/database-issues/issues/9803
Fixes https://github.com/MaterializeInc/database-issues/issues/9755
Closes https://github.com/MaterializeInc/database-issues/issues/9756
Tips for reviewer
Individual commits are meaningful, I recommend reviewing them separately.
I'm working on turmoil-based testing as well. Decided to keep that out of this PR though, since it's already large enough. Our existing CI around upgrade tests seems to be pretty good about finding issues that can occur with currently supported upgrade paths anyway. The turmoil tests will be more about exercising "exotic" scenarios, like multiple concurrent processes at the same version.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.