-
Notifications
You must be signed in to change notification settings - Fork 840
Update docs/design.md #2242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Update docs/design.md #2242
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Code Coverage DiffThis PR does not change the code coverage |
This comment was marked as resolved.
This comment was marked as resolved.
I was following the convention laid out in spec/spec.md where I use RPC anytime I mean "magic call that may happen on another computer, but written as if it were a local procedure call", and gRPC is more of an implementation detail (the specific implementation of RPC that CSI Spec relies upon). But happy to change this if other folks agree. Perhaps a repository glossary page is in order, like SOCI Snapshotter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely lgtm. Especially the sequence diagrams, this revision significantly elevates the doc.
/lgtm!
#### Not Implemented | ||
|
||
- ListVolumes | ||
- GetCapacity | ||
- ControllerGetVolume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest removing this snippet since its duplicating information.
|
||
### Timeouts | ||
|
||
gRPC always passes a timeout together with a request. After this timeout, the gRPC client call actually returns. The server (=CSI driver) can continue processing the call and finish the operation, however it has no means how to inform the client about the result. | ||
gRPC always passes a timeout together with a request. After this timeout, the gRPC client call actually returns. The server (i.e. ebs-plugin) can continue processing the call and finish the operation, however it has no means how to inform the client about the result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto above, drop this diff.
|
||
Kubernetes will retry failed calls, usually after some exponential backoff. Kubernetes heavily relies on idempotency here - i.e. when the driver finished an operation after the client timed out, the driver will get the same call again and it should return success/error based on success/failure of the previous operation. | ||
Kubernetes sidecars will retry failed calls after exponential backoff. These sidecars rely on idempotency here - i.e. when ebs-plugin finished an operation after the client timed out, the ebs-plugin will get the same call again, and it should return success/error based on success/failure of the previous operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto above, drop the "driver" -> "ebs-plugin" part of this diff.
Very rarely a node gets too busy and kubelet starves for CPU. It does not unmount a volume when it should and Kubernetes initiates detach of the volume. | ||
## EBS CSI Driver on Kubernetes | ||
|
||
### High Level Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is dangerously close to documenting CSI and/or CSI sidecar design rather than EBS CSI design. For example, if the information here can be found in the CSI spec's architecture section (https://github.com/container-storage-interface/spec/blob/master/spec.md#architecture) we are better off linking there than making our own version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respectfully, I think that this section is worth keeping, but I will gladly suggestions for edits.
We already reference/link the CSI Spec in this section. But I will retitle the section to make it clear I'm talking about EBS volume management on Kubernetes, not just EBS CSI Driver.
The CSI Spec's architecture section lists four different possible architectures for CSI drivers and it is not immediately clear which one applies to the EBS CSI installation. Figure one mentions:
a centralized Controller Plugin is available on the CO master host and the Node Plugin is available on all of the CO Nodes.
For EKS customers, the controller plugin is on SOME node, not the CO master host
.
Secondly, this section explicitly explains the difference between ebs-csi-controller
and ebs-csi-node
Pods, which prevents common mistakes many folks make when ramping up on EBS CSI Driver (e.g. I got the "ebs csi driver logs", but it didn't include any Create/AttachVolume calls, only mount operations).
Most importantly, multiple CSEs/customers/TAMs have explicitly said that this summary is more helpful than linking multiple different sources of Kubernetes/CSI/EBS documentation for understanding and troubleshooting EBS-backed volumes on their cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, totally agree on putting more emphasis on ebs csi DRIVER instead of the kubernetes components. Will do some rework/cutting.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
What type of PR is this?
/kind documentation
What is this PR about? / Why do we need it?
This PR updates docs/design.md so that it is no longer out of date. It also provides a high-level overview of both components of the EBS CSI Driver and adds a few sequence diagrams to help visualize the most common driver workflows.
Note: Look at PR in 'rich diff' mode to see new sequence diagrams.
How was this change tested?
n/a
Does this PR introduce a user-facing change?