Skip to content

Commit 4a379ed

Browse files
committed
Add extra worker to cluster after dev-scripts install is complete
1 parent 4d1c50d commit 4a379ed

File tree

9 files changed

+1246
-1
lines changed

9 files changed

+1246
-1
lines changed
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Implementation Summary: Post-Installation Worker Node Addition
2+
3+
## What Was Created
4+
5+
### Core Scripts
6+
7+
1. **`add_worker_node.sh`** - Main script to add worker nodes
8+
- Creates libvirt VM with customizable resources
9+
- Configures virtual BMC (vbmc or sushy-tools)
10+
- Generates BareMetalHost and Secret manifests
11+
- Provides step-by-step instructions for completion
12+
- Automatically finds available BMC ports and generates unique MAC addresses
13+
14+
2. **`remove_worker_node.sh`** - Cleanup script to remove worker nodes
15+
- Drains and deletes node from cluster
16+
- Removes BareMetalHost and Machine resources
17+
- Destroys VM and cleans up disk/NVRAM
18+
- Removes BMC configuration
19+
20+
3. **`auto_approve_csrs.sh`** - Helper script for CSR approval
21+
- Auto-approves pending CSRs for specified duration
22+
- Useful for development/testing scenarios
23+
- Default 30-minute duration
24+
25+
### Documentation
26+
27+
1. **`docs/add-worker-post-install.md`** - Complete documentation
28+
- Detailed usage instructions
29+
- Configuration options
30+
- Complete workflow examples
31+
- Troubleshooting guide
32+
- Architecture notes
33+
34+
2. **`WORKER_QUICK_START.md`** - Quick reference guide
35+
- TL;DR commands
36+
- Common use cases
37+
- Quick examples
38+
39+
3. **`README.md`** - Updated main README
40+
- Added new "Option 1: Add Workers Post-Installation"
41+
- Kept existing pre-configuration method as "Option 2"
42+
- Links to new documentation
43+
44+
### Makefile Integration
45+
46+
Added two new targets to `Makefile`:
47+
- `make add_worker WORKER_NAME=<name>` - Add a worker
48+
- `make remove_worker WORKER_NAME=<name>` - Remove a worker
49+
50+
## Key Features
51+
52+
### 1. No Pre-Planning Required
53+
Unlike the existing methods, this solution allows adding workers **after** deployment without requiring `NUM_EXTRA_WORKERS` to be set beforehand.
54+
55+
### 2. Flexible Configuration
56+
Users can customize worker resources via environment variables:
57+
```bash
58+
export EXTRA_WORKER_MEMORY=32768 # 32GB
59+
export EXTRA_WORKER_DISK=100 # 100GB
60+
export EXTRA_WORKER_VCPU=16 # 16 cores
61+
```
62+
63+
### 3. Smart Automation
64+
- Automatically finds available BMC ports
65+
- Generates unique MAC addresses
66+
- Detects and starts BMC containers if needed
67+
- Supports both IPMI and Redfish BMC protocols
68+
- Handles UEFI/BIOS firmware automatically
69+
70+
### 4. Complete Lifecycle Management
71+
- Add workers: `add_worker_node.sh`
72+
- Remove workers: `remove_worker_node.sh`
73+
- Auto-approve CSRs: `auto_approve_csrs.sh`
74+
75+
### 5. Safety Checks
76+
- Validates cluster connectivity
77+
- Checks for VM name conflicts
78+
- Checks for BareMetalHost conflicts
79+
- Validates worker name format
80+
81+
## Usage Comparison
82+
83+
### Old Method (Pre-Configuration Required)
84+
```bash
85+
# BEFORE initial deployment
86+
export NUM_EXTRA_WORKERS=2
87+
export EXTRA_WORKERS_ONLINE_STATUS=false
88+
make
89+
90+
# AFTER deployment
91+
oc apply -f ocp/ostest/extra_host_manifests.yaml
92+
oc scale machineset ostest-worker-0 --replicas=3 -n openshift-machine-api
93+
```
94+
95+
### New Method (Post-Installation)
96+
```bash
97+
# Deploy cluster normally
98+
make
99+
100+
# LATER: Add worker on demand
101+
./add_worker_node.sh my-worker-1
102+
oc apply -f ocp/ostest/my-worker-1_bmh.yaml
103+
./auto_approve_csrs.sh 30 &
104+
oc scale machineset ostest-worker-0 --replicas=3 -n openshift-machine-api
105+
```
106+
107+
## Architecture
108+
109+
### VM Creation
110+
- Uses libvirt XML to define VM
111+
- Supports x86_64 and aarch64 architectures
112+
- Configures UEFI or BIOS boot
113+
- Connects to baremetal network
114+
- Allocates disk storage in `/var/lib/libvirt/images/`
115+
116+
### BMC Configuration
117+
Two protocols supported:
118+
119+
1. **IPMI** (via vbmc)
120+
- Port range: 6230-6250
121+
- Protocol: `ipmi://<host>:<port>`
122+
- Container: vbmc
123+
124+
2. **Redfish** (via sushy-tools)
125+
- Port: 8000 (HTTP)
126+
- Protocol: `redfish-virtualmedia+http://<host>:8000/<vm-name>`
127+
- Container: sushy-tools
128+
- Default choice
129+
130+
### BareMetalHost Integration
131+
Generated manifest includes:
132+
- Secret with BMC credentials (admin/password)
133+
- BareMetalHost spec with:
134+
- BMC address and credentials
135+
- Boot MAC address
136+
- Online status (true)
137+
- Automated cleaning mode (disabled)
138+
139+
## Testing
140+
141+
All scripts pass bash syntax validation:
142+
```bash
143+
bash -n add_worker_node.sh # ✓ PASS
144+
bash -n remove_worker_node.sh # ✓ PASS
145+
bash -n auto_approve_csrs.sh # ✓ PASS
146+
```
147+
148+
## Files Modified/Created
149+
150+
### New Files
151+
- `add_worker_node.sh` (executable)
152+
- `remove_worker_node.sh` (executable)
153+
- `auto_approve_csrs.sh` (executable)
154+
- `docs/add-worker-post-install.md`
155+
- `WORKER_QUICK_START.md`
156+
- `.add_worker_implementation_summary.md` (this file)
157+
158+
### Modified Files
159+
- `Makefile` - Added `add_worker` and `remove_worker` targets
160+
- `README.md` - Updated "Testing with extra workers" section
161+
162+
## Dependencies
163+
164+
The scripts leverage existing dev-scripts infrastructure:
165+
- `common.sh` - Environment variables and common functions
166+
- `network.sh` - Network configuration
167+
- `utils.sh` - Utility functions
168+
- `ocp_install_env.sh` - OCP environment setup
169+
- `logging.sh` - Logging functions
170+
171+
## Compatibility
172+
173+
- Works with standard installer flow (`make` or `make all`)
174+
- Compatible with existing `NUM_EXTRA_WORKERS` workflow
175+
- Supports both IPMI and Redfish BMC protocols
176+
- Works with UEFI and BIOS boot modes
177+
- Supports x86_64 and aarch64 architectures
178+
179+
## Next Steps for Users
180+
181+
1. **Basic Usage**
182+
```bash
183+
./add_worker_node.sh worker-1
184+
oc apply -f ocp/ostest/worker-1_bmh.yaml
185+
oc scale machineset <name> --replicas=<N+1> -n openshift-machine-api
186+
```
187+
188+
2. **With Custom Resources**
189+
```bash
190+
export EXTRA_WORKER_MEMORY=32768 EXTRA_WORKER_DISK=100 EXTRA_WORKER_VCPU=16
191+
./add_worker_node.sh large-worker
192+
```
193+
194+
3. **Quick Start with Make**
195+
```bash
196+
make add_worker WORKER_NAME=worker-1
197+
```
198+
199+
4. **Auto-approve CSRs**
200+
```bash
201+
./auto_approve_csrs.sh 30 &
202+
```
203+
204+
## Limitations
205+
206+
1. Only works with libvirt-based deployments
207+
2. Requires BMC containers (vbmc/sushy-tools) to be available
208+
3. BMC port range limited to available ports in 6230-6250
209+
4. Worker must be on same network as other cluster nodes
210+
5. Resources (memory, disk, CPU) are set at VM creation time
211+
212+
## Future Enhancements
213+
214+
Possible improvements for future versions:
215+
- Support for multiple workers in one command
216+
- Interactive mode with prompts
217+
- Integration with Ansible playbooks
218+
- Support for additional disk attachment
219+
- Network configuration customization
220+
- BMC port range expansion
221+
- Support for non-libvirt platforms
222+
223+
## Support
224+
225+
For issues or questions:
226+
1. Check the documentation: `docs/add-worker-post-install.md`
227+
2. Review quick start: `WORKER_QUICK_START.md`
228+
3. Check script output for detailed instructions
229+
4. Verify cluster connectivity and resources
230+
231+
## Conclusion
232+
233+
This implementation provides a complete, flexible solution for adding worker nodes to dev-scripts clusters post-installation, eliminating the need for pre-planning and making it easier to scale clusters on demand for testing and development purposes.
234+

Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,12 @@ install_config:
6666
ocp_run:
6767
./06_create_cluster.sh
6868

69+
add_worker:
70+
./add_worker_node.sh $(WORKER_NAME)
71+
72+
remove_worker:
73+
./remove_worker_node.sh $(WORKER_NAME)
74+
6975
gather:
7076
./must_gather.sh
7177

README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -495,7 +495,31 @@ export IGNITION_EXTRA="ignition/file_example.ign"
495495
496496
### Testing with extra workers
497497
498-
It is possible to specify additional workers, which are not used in the initial
498+
#### Option 1: Add Workers Post-Installation (No Pre-Planning Required)
499+
500+
You can add worker nodes **after** your cluster is deployed without requiring pre-configuration:
501+
502+
```bash
503+
# Add a worker node
504+
./add_worker_node.sh my-worker-1
505+
506+
# Apply the generated manifest
507+
oc apply -f ocp/ostest/my-worker-1_bmh.yaml
508+
509+
# Scale the machineset to provision it
510+
oc scale machineset <cluster-name>-worker-0 --replicas=<N+1> -n openshift-machine-api
511+
```
512+
513+
Or using Make:
514+
```bash
515+
make add_worker WORKER_NAME=my-worker-1
516+
```
517+
518+
See [WORKER_QUICK_START.md](WORKER_QUICK_START.md) for a quick guide or [docs/add-worker-post-install.md](docs/add-worker-post-install.md) for complete documentation.
519+
520+
#### Option 2: Pre-Configure Extra Workers (Traditional Method)
521+
522+
You can specify additional workers during initial deployment, which are not used in the initial
499523
deployment, and can then later be used e.g to test scale-out. The default online
500524
status of the extra workers is true, but can be changed to false using
501525
EXTRA_WORKERS_ONLINE_STATUS.

WORKER_QUICK_START.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Quick Start: Adding Workers Post-Installation
2+
3+
## TL;DR
4+
5+
Add a worker node after your cluster is already deployed:
6+
7+
```bash
8+
# Add a worker
9+
./add_worker_node.sh my-worker-1
10+
11+
# Apply the generated manifest
12+
oc apply -f ocp/ostest/my-worker-1_bmh.yaml
13+
14+
# Auto-approve CSRs in background
15+
./auto_approve_csrs.sh 30 &
16+
17+
# Scale up your machineset
18+
oc get machineset -n openshift-machine-api
19+
oc scale machineset <your-cluster>-worker-0 --replicas=<N+1> -n openshift-machine-api
20+
21+
# Watch it join
22+
oc get nodes -w
23+
```
24+
25+
## Customizing Resources
26+
27+
```bash
28+
export EXTRA_WORKER_MEMORY=32768 # 32GB RAM
29+
export EXTRA_WORKER_DISK=100 # 100GB disk
30+
export EXTRA_WORKER_VCPU=16 # 16 vCPUs
31+
32+
./add_worker_node.sh my-large-worker
33+
```
34+
35+
## Using Make
36+
37+
```bash
38+
# Add worker
39+
make add_worker WORKER_NAME=worker-1
40+
41+
# Remove worker
42+
make remove_worker WORKER_NAME=worker-1
43+
```
44+
45+
## What Gets Created
46+
47+
- ✅ Libvirt VM with specified resources
48+
- ✅ Virtual BMC (IPMI or Redfish)
49+
- ✅ BareMetalHost manifest
50+
- ✅ Secret for BMC credentials
51+
- ✅ Complete setup instructions
52+
53+
## Removing a Worker
54+
55+
```bash
56+
# Remove from cluster and delete VM
57+
./remove_worker_node.sh my-worker-1
58+
59+
# Don't forget to scale down the machineset
60+
oc scale machineset <your-cluster>-worker-0 --replicas=<N-1> -n openshift-machine-api
61+
```
62+
63+
## Full Documentation
64+
65+
See [docs/add-worker-post-install.md](docs/add-worker-post-install.md) for detailed documentation.
66+

0 commit comments

Comments
 (0)