-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: fix race in split application #155143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
||
// ClearRaftTruncatedState clears the RaftTruncatedState. | ||
func (sl StateLoader) ClearRaftTruncatedState(writer storage.Writer) error { | ||
return writer.ClearUnversioned(sl.RaftTruncatedStateKey(), storage.ClearOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to double-check that this ClearUnversioned
is the right way to remove this key. I'm never sure with the various MVCC/unversioned/engine keys etc.
// | ||
// TODO(#152847): remove this line when there are no historical proposals with | ||
// RaftTruncatedState, e.g. after a below-raft migration. | ||
if err := rsl.ClearRaftTruncatedState(readWriter); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also read the key here, and clear it only if it exists. If it does exist, it necessarily comes from the batch; it can't exist in storage because the RHS replica does not exist or is uninitialized.
The purpose of this dance is to make the batch state-machine-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems worth doing IMO, but your call.
Would be good to test this, though it's a bit non-trivial because Need to think how to make this change+test backportable, since it's a bug fix. We probably already have a @arulajmani does your split testing work cover |
Holding off with |
This commit fixes the replica storage race that may occur when applying a split in the rare case there is a concurrent creation of a higher-ReplicaID RHS. The race stems from the fact that a higher-ReplicaID uninitialized replica is not locked for the entirety of splitPreApply, so it can be created and make some limited progress concurrently. We remove the clearing/rewriting of the unreplicated state which belongs to that RHS, to let it progress untouched. This also removes a blocker towards raft/state-machine storage separation: we don't want to be touching raft state of the RHS in the same batch with the state machine updates. Epic: none Release note (bug fix): fixed a race in range splits that can result in a regressed raft state of a post-split range. The race conditions are very rare / nearly impossible, and we haven't seen it in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a high level. I'm too over the place right now to do a detailed review, will leave that to @arulajmani. Nice to see this big wart getting removed!
// | ||
// TODO(#152847): remove this line when there are no historical proposals with | ||
// RaftTruncatedState, e.g. after a below-raft migration. | ||
if err := rsl.ClearRaftTruncatedState(readWriter); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems worth doing IMO, but your call.
// machine keys, and never contains unreplicated / raft keys. One exception: | ||
// there can still be historical split proposals that write the initial | ||
// RaftTruncatedState of the RHS. Remove this key (if exists), and set it | ||
// below only if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"set it below only if necessary" - this is referring to SynthesizeRaftState
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We set it only in the common case that the ReplicaID
is right and is locked. Then we do the RHS initialization, which entails initializing the RaftTruncatedState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: the reference though is to the SetRaftTruncatedState
below, right after SynthesizeRaftState
.
This PR fixes the replica storage race that may occur when applying a split in the rare case there is a concurrent creation of a higher-
ReplicaID
RHS. The race is described in #152199, and stems from the fact that a higher-ReplicaID
uninitialized replica is not locked for the entirety ofsplitPreApply
, so it can be created and make some limited progress concurrently.We remove the clearing/rewriting of the unreplicated state which belongs to that RHS, to let it progress untouched. This also removes a blocker towards raft/state-machine storage separation: we don't want to be touching raft state of the RHS in the same batch with the state machine updates.
Fixes #152199
Release note (bug fix): fixed a race in range splits that can result in a regressed raft state of a post-split range. The race conditions are very rare / nearly impossible, and we haven't seen it in the wild.