Skip to content

Commit 3263961

Browse files
committed
adapter: gracefully handle newer versions in migration shard
Prior to this change, if the leader environment restarted either during an in-progress 0dt upgrade or after an aborted one, it would fail to come up if that 0dt upgrade caused builtin item migrations to occur. The new version would poison the migration shard, and the old version would halt itself upon reading it. This commit applies the minimal fix: When performing builtin item migrations in leader mode, we now ignore the existence of newer versions in the migration shard, instead of crashing upon observing them. Read-only processes still crash upon observing newer versions. This change is meant to derisk subsequent releases before the rewrite of the builtin item migrations lands. In contrast to that rewrite, this change is small enough to be backported into previous versions, so we can deploy it prior to the next release that requires migrations to occur.
1 parent 7f31409 commit 3263961

File tree

1 file changed

+18
-6
lines changed

1 file changed

+18
-6
lines changed

src/adapter/src/catalog/open/builtin_item_migration.rs

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ use mz_storage_client::controller::StorageTxn;
4040
use mz_storage_types::StorageDiff;
4141
use mz_storage_types::sources::SourceData;
4242
use timely::progress::{Antichain, Timestamp as TimelyTimestamp};
43-
use tracing::{debug, error, info};
43+
use tracing::{debug, error, info, warn};
4444

4545
use crate::catalog::open::builtin_item_migration::persist_schema::{TableKey, TableKeySchema};
4646
use crate::catalog::state::LocalExpressionCache;
@@ -415,11 +415,23 @@ async fn migrate_builtin_collections_incompatible(
415415
let storage_collection_metadata = txn.get_collection_metadata();
416416
for (table_key, shard_id) in global_id_shards.clone() {
417417
if table_key.build_version > build_version {
418-
halt!(
419-
"saw build version {}, which is greater than current build version {}",
420-
table_key.build_version,
421-
build_version
422-
);
418+
if read_only {
419+
halt!(
420+
"saw build version {}, which is greater than current build version {}",
421+
table_key.build_version,
422+
build_version
423+
);
424+
} else {
425+
// If we are in leader mode, and a newer (read-only) version has started a
426+
// migration, we must not allow ourselves to get fenced out! Continuing here might
427+
// confuse any read-only process running the migrations concurrently, but it's
428+
// better for the read-only env to crash than the leader.
429+
// TODO(#9755): handle this in a more principled way
430+
warn!(
431+
%table_key.build_version, %build_version,
432+
"saw build version which is greater than current build version",
433+
);
434+
}
423435
}
424436

425437
if !migrated_storage_collections.contains(&table_key.global_id)

0 commit comments

Comments
 (0)