Skip to content

Commit 1cfbf05

Browse files
pkramer509sajmera-pensando
authored andcommitted
New Openshift specific documentation for airgapped environments. (#795)
(cherry picked from commit a7c8c04a2dd2e8f082479055102723deff60011d)
1 parent 0d30bf6 commit 1cfbf05

File tree

4 files changed

+112
-5
lines changed

4 files changed

+112
-5
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Air-gapped Installation Guide for Openshift Environments
2+
3+
This guide explains how to install the AMD GPU Operator in an air-gapped environment where the Openshift cluster has no external network connectivity.
4+
This procedure assumes that the system has internet access during the image creation and mirroring process. We are using the OpenShift internal repository for convenience, but the procedure should be similar for external repositories like quay and docker; however, the process as a whole may differ.
5+
Currently we only support GPU operator installation in air-gapped environment with a pre-compiled driver. To build(pre-compile) driver one of the system (it can be in staging environment) should have internet access during image creation and mirroring process.
6+
7+
## Prerequisites
8+
9+
- OpenShift 4.16+
10+
- Internal repository is configured, see https://instinct.docs.amd.com/projects/gpu-operator/en/latest/installation/openshift-olm.html#configure-internal-registry for details.
11+
- Internet Access during operator install, driver compilation and image import processes.
12+
- NFD, KMM and GPU Operator installed via OperatorHub
13+
14+
### Required Images
15+
16+
The following images must be mirrored to your internal registry, see section 2.A in this document for details.
17+
18+
```
19+
rocm/k8s-device-plugin:rhubi-latest
20+
rocm/k8s-node-labeller:rhubi-latest
21+
```
22+
## Installation Steps
23+
24+
### 1. Build precompiled driver image
25+
26+
Since this image is built in situ this procedure will differ from the images for the various GPU Operator components such as the labeler and device-plugin
27+
28+
A. Use basic DeviceConfig Custom Resource (CR), this will trigger a build when created and put the precompiled driver in the default imagestream location (image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod)
29+
30+
```yaml
31+
apiVersion: amd.com/v1alpha1
32+
kind: DeviceConfig
33+
metadata:
34+
name: devconf
35+
namespace: kube-amd-gpu
36+
spec:
37+
driver:
38+
enable: true
39+
version: "6.4.1"
40+
41+
devicePlugin:
42+
devicePluginImage: rocm/k8s-device-plugin:rhubi-latest
43+
nodeLabellerImage: rocm/k8s-device-plugin:labeller-rhubi-latest
44+
45+
selector:
46+
feature.node.kubernetes.io/amd-gpu: "true"
47+
```
48+
49+
B. Create the CR to trigger the build process.
50+
```bash
51+
$ oc create -f myDeviceConfig.y -n kube-amd-gpu
52+
deviceconfig.amd.com/devconf created
53+
```
54+
55+
C. Observe the build process complete.
56+
```bash
57+
$ oc get pods -n kube-amd-gpu | grep build
58+
devconf-build-trzb6-build 1/1 Running 0 12s
59+
60+
# observe build using oc log command
61+
$ oc logs devconf-build-trzb6-build -n kube-amd-gpu
62+
```
63+
64+
D. Once the build is complete, verify that the precompiled image is located in the internal registry.
65+
```bash
66+
$ oc get is -n kube-amd-gpu
67+
NAME IMAGE REPOSITORY TAGS UPDATED
68+
amdgpu_kmod image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod coreos-9.6-5.14.0-570.19.1.el9_6.x86_64-6.4.1 3 days ago
69+
```
70+
71+
### 2. Import required images
72+
73+
A. Import the device-labeller and device-plugin images from docker into your internal registry
74+
```bash
75+
oc import-image rocm/k8s-device-plugin:rhubi-latest -n kube-amd-gpu --confirm
76+
oc import-image rocm/k8s-node-labeller:rhubi-latest -n kube-amd-gpu --confirm
77+
```
78+
79+
B. Once imported, verify that the required images are located in the internal registry.
80+
```bash
81+
$ oc get is -n kube-amd-gpu
82+
NAME IMAGE REPOSITORY TAGS UPDATED
83+
amdgpu_kmod image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod coreos-9.6-5.14.0-570.19.1.el9_6.x86_64-6.4.1 3 days ago
84+
k8s-device-plugin image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-device-plugin rhubi-latest 2 hours ago
85+
k8s-node-labeller image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-node-labeller rhubi-latest 2 hours ago
86+
```
87+
88+
### 3. Deployment of DeviceConfig in disconnected environment
89+
90+
A. Once all the required images and the precompiled driver are present in the internal registry we can now deploy the modified DeviceConfig. Note: the image variables are pointing to the internal registry instead the external rcom repository.
91+
```yaml
92+
apiVersion: amd.com/v1alpha1
93+
kind: DeviceConfig
94+
metadata:
95+
name: devconf
96+
namespace: kube-amd-gpu
97+
spec:
98+
driver:
99+
image: image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/amdgpu_kmod
100+
enable: true
101+
version: "6.4.1"
102+
103+
devicePlugin:
104+
devicePluginImage: image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-device-plugin:rhubi-latest
105+
nodeLabellerImage: image-registry.openshift-image-registry.svc:5000/kube-amd-gpu/k8s-node-labeller:rhubi-latest
106+
107+
selector:
108+
feature.node.kubernetes.io/amd-gpu: "true"
109+
```

docs/specialized_networks/airgapped-install.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This guide explains how to install the AMD GPU Operator in an air-gapped environ
44

55
## Prerequisites
66

7-
- Kubernetes v1.29.0+ or OpenShift 4.16+
7+
- Kubernetes v1.29.0+
88
- Helm v3.2.0+
99
- Access to an internal container registry
1010

@@ -26,10 +26,6 @@ rocm/k8s-device-plugin-labeller:<version>
2626
quay.io/jetstack/cert-manager-controller:<version>
2727
quay.io/jetstack/cert-manager-webhook:<version>
2828
quay.io/jetstack/cert-manager-cainjector:<version>
29-
30-
# For OpenShift Only
31-
registry.redhat.io/openshift4/ose-node-feature-discovery:<version>
32-
registry.redhat.io/openshift4/kernel-module-management:<version>
3329
```
3430

3531
### Required RPM/DEB Packages

docs/sphinx/_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ subtrees:
6060
- caption: Specialized Networks
6161
entries:
6262
- file: specialized_networks/airgapped-install
63+
- file: specialized_networks/airgapped-install-openshift
6364
- file: specialized_networks/http-proxy
6465
- caption: Slurm on Kubernetes
6566
entries:

docs/sphinx/_toc.yml.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ subtrees:
6060
- caption: Specialized Networks
6161
entries:
6262
- file: specialized_networks/airgapped-install
63+
- file: specialized_networks/airgapped-install-openshift
6364
- file: specialized_networks/http-proxy
6465
- caption: Slurm on Kubernetes
6566
entries:

0 commit comments

Comments
 (0)