Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
234 changes: 234 additions & 0 deletions .add_worker_implementation_summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# Implementation Summary: Post-Installation Worker Node Addition

## What Was Created

### Core Scripts

1. **`add_worker_node.sh`** - Main script to add worker nodes
- Creates libvirt VM with customizable resources
- Configures virtual BMC (vbmc or sushy-tools)
- Generates BareMetalHost and Secret manifests
- Provides step-by-step instructions for completion
- Automatically finds available BMC ports and generates unique MAC addresses

2. **`remove_worker_node.sh`** - Cleanup script to remove worker nodes
- Drains and deletes node from cluster
- Removes BareMetalHost and Machine resources
- Destroys VM and cleans up disk/NVRAM
- Removes BMC configuration

3. **`auto_approve_csrs.sh`** - Helper script for CSR approval
- Auto-approves pending CSRs for specified duration
- Useful for development/testing scenarios
- Default 30-minute duration

### Documentation

1. **`docs/add-worker-post-install.md`** - Complete documentation
- Detailed usage instructions
- Configuration options
- Complete workflow examples
- Troubleshooting guide
- Architecture notes

2. **`WORKER_QUICK_START.md`** - Quick reference guide
- TL;DR commands
- Common use cases
- Quick examples

3. **`README.md`** - Updated main README
- Added new "Option 1: Add Workers Post-Installation"
- Kept existing pre-configuration method as "Option 2"
- Links to new documentation

### Makefile Integration

Added two new targets to `Makefile`:
- `make add_worker WORKER_NAME=<name>` - Add a worker
- `make remove_worker WORKER_NAME=<name>` - Remove a worker

## Key Features

### 1. No Pre-Planning Required
Unlike the existing methods, this solution allows adding workers **after** deployment without requiring `NUM_EXTRA_WORKERS` to be set beforehand.

### 2. Flexible Configuration
Users can customize worker resources via environment variables:
```bash
export EXTRA_WORKER_MEMORY=32768 # 32GB
export EXTRA_WORKER_DISK=100 # 100GB
export EXTRA_WORKER_VCPU=16 # 16 cores
```

### 3. Smart Automation
- Automatically finds available BMC ports
- Generates unique MAC addresses
- Detects and starts BMC containers if needed
- Supports both IPMI and Redfish BMC protocols
- Handles UEFI/BIOS firmware automatically

### 4. Complete Lifecycle Management
- Add workers: `add_worker_node.sh`
- Remove workers: `remove_worker_node.sh`
- Auto-approve CSRs: `auto_approve_csrs.sh`

### 5. Safety Checks
- Validates cluster connectivity
- Checks for VM name conflicts
- Checks for BareMetalHost conflicts
- Validates worker name format

## Usage Comparison

### Old Method (Pre-Configuration Required)
```bash
# BEFORE initial deployment
export NUM_EXTRA_WORKERS=2
export EXTRA_WORKERS_ONLINE_STATUS=false
make

# AFTER deployment
oc apply -f ocp/ostest/extra_host_manifests.yaml
oc scale machineset ostest-worker-0 --replicas=3 -n openshift-machine-api
```

### New Method (Post-Installation)
```bash
# Deploy cluster normally
make

# LATER: Add worker on demand
./add_worker_node.sh my-worker-1
oc apply -f ocp/ostest/my-worker-1_bmh.yaml
./auto_approve_csrs.sh 30 &
oc scale machineset ostest-worker-0 --replicas=3 -n openshift-machine-api
```

## Architecture

### VM Creation
- Uses libvirt XML to define VM
- Supports x86_64 and aarch64 architectures
- Configures UEFI or BIOS boot
- Connects to baremetal network
- Allocates disk storage in `/var/lib/libvirt/images/`

### BMC Configuration
Two protocols supported:

1. **IPMI** (via vbmc)
- Port range: 6230-6250
- Protocol: `ipmi://<host>:<port>`
- Container: vbmc

2. **Redfish** (via sushy-tools)
- Port: 8000 (HTTP)
- Protocol: `redfish-virtualmedia+http://<host>:8000/<vm-name>`
- Container: sushy-tools
- Default choice

### BareMetalHost Integration
Generated manifest includes:
- Secret with BMC credentials (admin/password)
- BareMetalHost spec with:
- BMC address and credentials
- Boot MAC address
- Online status (true)
- Automated cleaning mode (disabled)

## Testing

All scripts pass bash syntax validation:
```bash
bash -n add_worker_node.sh # ✓ PASS
bash -n remove_worker_node.sh # ✓ PASS
bash -n auto_approve_csrs.sh # ✓ PASS
```

## Files Modified/Created

### New Files
- `add_worker_node.sh` (executable)
- `remove_worker_node.sh` (executable)
- `auto_approve_csrs.sh` (executable)
- `docs/add-worker-post-install.md`
- `WORKER_QUICK_START.md`
- `.add_worker_implementation_summary.md` (this file)

### Modified Files
- `Makefile` - Added `add_worker` and `remove_worker` targets
- `README.md` - Updated "Testing with extra workers" section

## Dependencies

The scripts leverage existing dev-scripts infrastructure:
- `common.sh` - Environment variables and common functions
- `network.sh` - Network configuration
- `utils.sh` - Utility functions
- `ocp_install_env.sh` - OCP environment setup
- `logging.sh` - Logging functions

## Compatibility

- Works with standard installer flow (`make` or `make all`)
- Compatible with existing `NUM_EXTRA_WORKERS` workflow
- Supports both IPMI and Redfish BMC protocols
- Works with UEFI and BIOS boot modes
- Supports x86_64 and aarch64 architectures

## Next Steps for Users

1. **Basic Usage**
```bash
./add_worker_node.sh worker-1
oc apply -f ocp/ostest/worker-1_bmh.yaml
oc scale machineset <name> --replicas=<N+1> -n openshift-machine-api
```

2. **With Custom Resources**
```bash
export EXTRA_WORKER_MEMORY=32768 EXTRA_WORKER_DISK=100 EXTRA_WORKER_VCPU=16
./add_worker_node.sh large-worker
```

3. **Quick Start with Make**
```bash
make add_worker WORKER_NAME=worker-1
```

4. **Auto-approve CSRs**
```bash
./auto_approve_csrs.sh 30 &
```

## Limitations

1. Only works with libvirt-based deployments
2. Requires BMC containers (vbmc/sushy-tools) to be available
3. BMC port range limited to available ports in 6230-6250
4. Worker must be on same network as other cluster nodes
5. Resources (memory, disk, CPU) are set at VM creation time

## Future Enhancements

Possible improvements for future versions:
- Support for multiple workers in one command
- Interactive mode with prompts
- Integration with Ansible playbooks
- Support for additional disk attachment
- Network configuration customization
- BMC port range expansion
- Support for non-libvirt platforms

## Support

For issues or questions:
1. Check the documentation: `docs/add-worker-post-install.md`
2. Review quick start: `WORKER_QUICK_START.md`
3. Check script output for detailed instructions
4. Verify cluster connectivity and resources

## Conclusion

This implementation provides a complete, flexible solution for adding worker nodes to dev-scripts clusters post-installation, eliminating the need for pre-planning and making it easier to scale clusters on demand for testing and development purposes.

6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@ install_config:
ocp_run:
./06_create_cluster.sh

add_worker:
./add_worker_node.sh $(WORKER_NAME)

remove_worker:
./remove_worker_node.sh $(WORKER_NAME)

gather:
./must_gather.sh

Expand Down
26 changes: 25 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -495,7 +495,31 @@ export IGNITION_EXTRA="ignition/file_example.ign"

### Testing with extra workers

It is possible to specify additional workers, which are not used in the initial
#### Option 1: Add Workers Post-Installation (No Pre-Planning Required)

You can add worker nodes **after** your cluster is deployed without requiring pre-configuration:

```bash
# Add a worker node
./add_worker_node.sh my-worker-1

# Apply the generated manifest
oc apply -f ocp/ostest/my-worker-1_bmh.yaml

# Scale the machineset to provision it
oc scale machineset <cluster-name>-worker-0 --replicas=<N+1> -n openshift-machine-api
```

Or using Make:
```bash
make add_worker WORKER_NAME=my-worker-1
```

See [WORKER_QUICK_START.md](WORKER_QUICK_START.md) for a quick guide or [docs/add-worker-post-install.md](docs/add-worker-post-install.md) for complete documentation.

#### Option 2: Pre-Configure Extra Workers (Traditional Method)

You can specify additional workers during initial deployment, which are not used in the initial
deployment, and can then later be used e.g to test scale-out. The default online
status of the extra workers is true, but can be changed to false using
EXTRA_WORKERS_ONLINE_STATUS.
Expand Down
66 changes: 66 additions & 0 deletions WORKER_QUICK_START.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Quick Start: Adding Workers Post-Installation

## TL;DR

Add a worker node after your cluster is already deployed:

```bash
# Add a worker
./add_worker_node.sh my-worker-1

# Apply the generated manifest
oc apply -f ocp/ostest/my-worker-1_bmh.yaml

# Auto-approve CSRs in background
./auto_approve_csrs.sh 30 &

# Scale up your machineset
oc get machineset -n openshift-machine-api
oc scale machineset <your-cluster>-worker-0 --replicas=<N+1> -n openshift-machine-api

# Watch it join
oc get nodes -w
```

## Customizing Resources

```bash
export EXTRA_WORKER_MEMORY=32768 # 32GB RAM
export EXTRA_WORKER_DISK=100 # 100GB disk
export EXTRA_WORKER_VCPU=16 # 16 vCPUs

./add_worker_node.sh my-large-worker
```

## Using Make

```bash
# Add worker
make add_worker WORKER_NAME=worker-1

# Remove worker
make remove_worker WORKER_NAME=worker-1
```

## What Gets Created

- ✅ Libvirt VM with specified resources
- ✅ Virtual BMC (IPMI or Redfish)
- ✅ BareMetalHost manifest
- ✅ Secret for BMC credentials
- ✅ Complete setup instructions

## Removing a Worker

```bash
# Remove from cluster and delete VM
./remove_worker_node.sh my-worker-1

# Don't forget to scale down the machineset
oc scale machineset <your-cluster>-worker-0 --replicas=<N-1> -n openshift-machine-api
```

## Full Documentation

See [docs/add-worker-post-install.md](docs/add-worker-post-install.md) for detailed documentation.

Loading