-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Remove infinite snapshot logging #137821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remove infinite snapshot logging #137821
Conversation
Modifies the SnapshotShutdownProgressTracker to only log the periodic progress report when there are still snapshots left to process. This removes the existing behaviour where the report will log indefinitely, even once all snapshots are completed or paused. Closes: ES-12794
…ng-stops-when-no-snapshots
| // Simulate starting shutdown -- should reset the completion stats and start logging | ||
| // Register each snapshot as in progress | ||
| for (int i = 0; i < 5; ++i) { | ||
| tracker.incNumberOfShardSnapshotsInProgress(dummyShardId, dummySnapshot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mimics production behaviour where snapshots are added to the tracker prior to the node shutting down. If I didn't, when calling tracker.onClusterStateAddShutdown(); the periodic logger would begin and then immediately end since there would be no in-progress snapshots
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
DiannaHohensee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a few nits, but superficial stuff.
Comments are great to have, but they should explain why something is happening, not only say what’s happening. ‘What’ can generally be understood with some effort from reading the code, but ‘why’ is the difficult part for new readers to grasp. I’d use ‘what’ comments for high level what’s happening in a section of code, when there's a series of steps. But having the ‘what’ without a clear ‘why’ is often frustrating for a reader. E.g., “I wrote this thing and this is what it does, but for the life of me I can’t remember why I wrote it to do that thing”, becomes a problem.
server/src/main/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTracker.java
Outdated
Show resolved
Hide resolved
| pausedCount.get() | ||
| ); | ||
| // If there are no more snapshots in progress, then stop logging periodic progress reports | ||
| if (numberOfShardSnapshotsInProgressOnDataNode.get() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we safely add asserts in this if-statement that the other stat values are zero / empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm what would you be looking for specifically here? I don't think there's anything to assert but please correct me if I'm wrong! Running through the values outputted in the log:
shutdownStartMillisshould be any non -1 value, (since this code only runs when the node is shutting down) but that's the same as in the below log case tooshutdownFinishedSignallingPausingMilliscould be any value, including -1, right, because this doesn't necessarily have to be set if the snapshots aren't paused but are instead completed?numberOfShardSnapshotsInProgressOnDataNodeis already 0shardSnapshotRequestscan be anything, right? So nothing to assert theredone,failed,abortedandpausedare dependent on the snapshot status so nothing to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My expectation was that shardSnapshotRequests should be empty when the number of numberOfShardSnapshotsInProgressOnDataNode reaches zero. The snapshotting begins here with a listener runAfter hook to decrement the numSnapshots. The decrement runs after this listener, which is responsible for notifying the master of the outcome of the shard snapshot, but it doesn't appear to do that synchronously....
If there's an outstanding request to the master node to inform it about the snapshot's completion that has not returned, then this logger shouldn't be turned off, I think.
The if statement could become
if (numberOfShardSnapshotsInProgressOnDataNode.get() == 0 && shardSnapshotRequests.isEmpty()) {
....
}
Then we don't have to worry about it?
server/src/main/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTracker.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTracker.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTrackerTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTrackerTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTrackerTests.java
Outdated
Show resolved
Hide resolved
| // Register each snapshot as in progress | ||
| for (int i = 0; i < 5; ++i) { | ||
| tracker.incNumberOfShardSnapshotsInProgress(dummyShardId, dummySnapshot); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you throw this into a helper method, since this is common across tests?
It'd be fancier to randomize (and/or un-hardcode) the number, but I believe that would require another parameter given to testPeriodicLogReport: up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I considered this when writing the tests. Happy to randomise if you think it would add any benefit, but I was copying the existing precedent set by the tests that had 5 in-progress snapshots, and I didn't want to bloat the tests with unnecessary code modifying all the expectations to match a random number of snapshots and then lose the clarity / meaning of the tests. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The undesirable part is the magic number that's presumed across setInFlightSnapshots and ensurePeriodicLogMessagesAsSnapshotsComplete. Passing a parameter of 5 would be better code practice than two methods magically knowing about what the other does.
I didn't want to bloat the tests with unnecessary code modifying all the expectations to match a random number of snapshots
IIUC, you've localized all the relevant code in ensurePeriodicLogMessagesAsSnapshotsComplete. I think little, if anything, would be changed outside that method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added in e2151a3 👍
| tracker.incNumberOfShardSnapshotsInProgress(dummyShardId, dummySnapshot); | ||
| } | ||
|
|
||
| // 1. Simulate a node shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you lost a few numbers while refactoring? There's a 1 and a 3.
| // 1. Simulate a node shutdown | |
| // Simulate a node shutdown so the tracker starts logging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add why
server/src/test/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTrackerTests.java
Outdated
Show resolved
Hide resolved
…en-no-snapshots' of https://github.com/joshua-adams-1/elasticsearch into snapshot-logging-improvements-periodic-logging-stops-when-no-snapshots
|
Hey, thank you for reviewing. I've updated in line with the comments, let me know if there's anything else you want me to clarify / improve |
| } | ||
| logger.info("---> Generated shard snapshot status in stage (" + status.getStage() + ") for switch case (" + (i % 4) + ")"); | ||
| tracker.decNumberOfShardSnapshotsInProgress(dummyShardId, dummySnapshot, status); | ||
| // Advance time to the next periodic log message and expect it to include the new completion stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Advance time to the next periodic log message and expect it to include the new completion stats. | |
| // Advance time to the next scheduled periodic log message and expect it to include the new completion stats. |
| } | ||
|
|
||
| private void ensurePeriodicLogMessagesAsSnapshotsComplete(SnapshotShutdownProgressTracker tracker, MockLog mockLog) { | ||
| // We are going to simulate 5 in-progress snapshots while the node is shutting down, so we expect the periodic logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // We are going to simulate 5 in-progress snapshots while the node is shutting down, so we expect the periodic logger | |
| // We are going to simulate completing 5 in-progress snapshots while the node is shutting down, so we expect the periodic logger |
| setLogPeriodicProgressReportSeenEventExpectation(mockLog, 2, 1, 1, 1, 0); | ||
| setLogPeriodicProgressReportSeenEventExpectation(mockLog, 1, 1, 1, 1, 1); | ||
|
|
||
| // When there are no more in-progress snapshots, the tracker emits a final log message, and terminates the periodic logger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment suggests it's looking for another log message, when in fact the check is that there is not a log message.
| // When there are no more in-progress snapshots, the tracker emits a final log message, and terminates the periodic logger. | |
| // When the last snapshot completes, the tracker will emit a different log message, saying that it is finished, before terminating the periodic logger. We should not see the regular message with the final stats update. |
| ) | ||
| ); | ||
|
|
||
| // Adds in-progress snapshots so that the tracker doesn't immediately exit on seeing that all snapshots are finished, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Adds in-progress snapshots so that the tracker doesn't immediately exit on seeing that all snapshots are finished, | |
| // Adds an in-progress snapshot so that the tracker doesn't immediately exit on seeing that all snapshots are finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto below using this comment.
| ); | ||
|
|
||
| // Adds in-progress snapshots so that the tracker doesn't immediately exit on seeing that all snapshots are finished, | ||
| // and should begin logging the periodic progress report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is rather confusing, since incrementing the tracker doesn't make logging start -- and the subsequent lines test that the tracker doesn't log anything because it is disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto below using this comment.
| } | ||
|
|
||
| // Clear start and pause timestamps | ||
| // 3. Clear start and pause timestamps, and expect the periodic logger to reset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does reset mean? The logger stops.
| * Ensure that we only log a periodic progress report while there are in progress snapshots, | ||
| * and that we do not continue to log after all snapshots have completed | ||
| */ | ||
| private boolean shouldLogPeriodicProgressReport = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be volatile. I also don't think this is safe to change before canceling the scheduled logProgressReport(): you could set it to false before cancelling, and race with logProgressReport()'s assert that it is true.
This boolean need only represent completion of exit log message, with an assert in that if-statement, to prevent a double exit message. Other concurrency is already handled by the other methods.
…ng-stops-when-no-snapshots
Modifies the
SnapshotShutdownProgressTrackerto only log the periodic progress report while there are still snapshots left to process. This removes the existing behaviour where the report will log indefinitely, even once all snapshots are completed or paused.Closes: ES-12794