Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/ci-full.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ jobs:
- name: Mount CVMFS
run: |
kubectl create namespace cvmfs-csi
helm install -n cvmfs-csi cvmfs-csi oci://registry.cern.ch/kubernetes/charts/cvmfs-csi --values ci/values-cvmfs-csi.yaml
kubectl apply -f ci/cvmfs-storageclass.yaml -n cvmfs-csi
helm install -n cvmfs-csi cvmfs-csi oci://registry.cern.ch/kubernetes/charts/cvmfs-csi --values cvmfs/values-cvmfs-csi.yaml
kubectl apply -f cvmfs/cvmfs-storageclass.yaml -n cvmfs-csi

- name: Deploy Helm chart
run: |
Expand Down Expand Up @@ -98,7 +98,7 @@ jobs:

- name: Run Perf Analyzer Job
run: |
kubectl apply -f ci/perf-analyzer-job.yaml
kubectl apply -f tests/perf-analyzer-job-ci.yaml
kubectl wait --for=condition=complete job/perf-analyzer-job -n cms --timeout=300s || \
(echo "Perf-analyzer job did not complete in time or failed." && exit 1)

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/ci-local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ helm install keda kedacore/keda --namespace keda
echo "Mounting CVMFS..."
kubectl create namespace cvmfs-csi
helm install -n cvmfs-csi cvmfs-csi oci://registry.cern.ch/kubernetes/charts/cvmfs-csi \
--values ci/values-cvmfs-csi.yaml
kubectl apply -f ci/cvmfs-storageclass.yaml -n cvmfs-csi
--values cvmfs/values-cvmfs-csi.yaml
kubectl apply -f cvmfs/cvmfs-storageclass.yaml -n cvmfs-csi

# 7. Deploy the Helm chart for supersonic
echo "Deploying Helm chart for supersonic..."
Expand Down Expand Up @@ -82,7 +82,7 @@ kubectl get all -n cms

# 10. Run Perf Analyzer Job
echo "Running Perf Analyzer Job..."
kubectl apply -f ci/perf-analyzer-job.yaml
kubectl apply -f tests/perf-analyzer-job-ci.yaml
kubectl wait --for=condition=complete job/perf-analyzer-job -n cms --timeout=180s || {
echo "Perf-analyzer job did not complete in time or failed."
exit 1
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/helm-lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:

- name: Generate JSON schema
run: |
python ci/yaml-to-schema.py helm/supersonic/values.yaml helm/supersonic/values.schema.json
python .github/workflows/yaml-to-schema.py helm/supersonic/values.yaml helm/supersonic/values.schema.json

- name: Commit and push changes
env:
Expand Down
File renamed without changes.
139 changes: 129 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,24 +27,145 @@ Currently, SuperSONIC supports the following functionality:

## Installation

**Pre-requisites:**
- a Kubernetes cluster with access to GPUs
- a Prometheus instance installed on the cluster, or Prometheus CRDs to deploy your own instance
- KEDA CRDs installed on the cluster (only if using autoscaling)
### Pre-requisites

<details>
<summary><strong>Kubernetes cluster</strong></summary>

ideally with access to GPUs, but CPUs are enough for a minimal deployment.
</details>

<details>
<summary><strong>Helm</strong></summary>

Helm is a package manager for Kubernetes.
To install Helm on your machine, follow the official instructions at [https://helm.sh/docs/intro/install/](https://helm.sh/docs/intro/install/).
</details>

<details>
<summary><strong>Custom Resource Definitions (CRDs) – not needed for minimal deployment</strong></summary>

- [Prometheus](https://prometheus.io) CRDs

If you are using an established Kubernetes cluster (e.g. at an HPC), there is a high chance that these CRDs are already installed. Otherwise, cluster admin can use the following commands:
<details>
<summary><strong>How to install Prometheus CRDs</strong></summary>

```
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install prometheus-operator prometheus-community/kube-prometheus-stack --namespace monitoring --set prometheusOperator.createCustomResource=false --set defaultRules.create=false --set alertmanager.enabled=false --set prometheus.enabled=false --set grafana.enabled=false
```
</details>
- [KEDA](https://keda.sh) CRDs (only if using autoscaling)

<details>
<summary><strong>How to install Prometheus CRDs</strong></summary>

```
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda
```
</details>
</details>

---

### Standard deployment

If you are installing SuperSONIC for the first time, proceed to the [Minimal deployment](#minimal-deployment) section below.

If you already have a functional `values.yaml` and/or installed SuperSONIC previously, use the following installation commands:

```
helm repo add fastml https://fastmachinelearning.org/SuperSONIC
helm repo update
helm install <release-name> fastml/supersonic -n <namespace> -f <values.yaml>
```

To construct the `values.yaml` file for your application, follow [Configuration guide](http://fastmachinelearning.org/SuperSONIC/configuration-guide.html "Configuration guide").

The full list of configuration parameters is available in the [Configuration reference](http://fastmachinelearning.org/SuperSONIC/configuration-reference.html "Configuration reference").

---

### Minimal deployment

<details>
<summary><strong>1. Install cvmfs-csi plugin to load models from CVMFS</strong></summary>

For an example installation, we will use CMS models loaded from [CVMFS](https://cvmfs.readthedocs.io/en/stable/). SuperSONIC allows other types of model repository, including
an arbitrary Persistent Volume, an NFS volume, or S3 storage.

[cvmfs-csi](https://github.com/cvmfs-contrib/cvmfs-csi) plugin allows to easily mount CVMFS
into a Kubernetes cluster by creating a new storage class. A Persistent Volume created with this
storage class will have CVMFS contents visible inside.

Cluster admin can use the following commands to install `cvmfs-csi`:
```
kubectl create namespace cvmfs-csi
helm install -n cvmfs-csi cvmfs-csi oci://registry.cern.ch/kubernetes/charts/cvmfs-csi --values cvmfs/values-cvmfs-csi.yaml
kubectl apply -f cvmfs/cvmfs-storageclass.yaml -n cvmfs-csi
```
</details>

<details>
<summary><strong>Install the latest released version from the Helm repository</strong></summary>
<summary><strong>2. Install SuperSONIC with minimal configuration</strong></summary>

The minimal deployment will install only a single CPU-based Triton server and an Envoy Proxy.
We will use [`values/values-minimal.yaml`](values/values-minimal.yaml) as our minimal
configuration file.

```
helm repo add fastml https://fastmachinelearning.org/SuperSONIC
helm repo update
helm install <release-name> fastml/supersonic -n <namespace> -f <your-values.yaml>
helm install <release-name> fastml/supersonic -n <namespace> -f values/values-minimal.yaml
```
</details>

<details>
<summary><strong>3. Deploy a test job to run inferences</strong></summary>

To test your SuperSONIC installation, we will create a small [Nvidia Performance Analyzer](https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton-inference-server-2280/user-guide/docs/user_guide/perf_analyzer.html) job,
which will send a single inference request with random input data to Envoy Proxy endpoint.

1. In `tests/perf-analyzer-job.yaml`, edit the following parameters to match your deployment:

```
metadata:
namespace: <namespace>
```

In `perf_analyzer` command:

```
-u <release-name>.<namespace>.svc.cluster.local:8001
```

2. Submit the job to your Kubernetes cluster:

```
kubectl apply -n <namespace> -f tests/perf-analyzer-job.yaml
```

3. Track job performance and inspect logs:

```
kubectl get pods -l job-name=perf-analyzer-job -n <namespace>
kubectl logs <pod-name> -n <namespace>
```

</details>

---

### Installing from a GitHub branch/tag/commit

<details>
<summary><strong>Install directly from a GitHub branch/tag/commit</strong></summary>
<summary><strong>This option may be useful for testing unreleased features.</strong></summary>

```
git clone https://github.com/fastmachinelearning/SuperSONIC.git
Expand All @@ -56,9 +177,6 @@ helm install <release-name> helm/supersonic -n <namespace> -f <your-values.yaml>

</details>

To construct the `values.yaml` file for your application, follow [Configuration guide](http://fastmachinelearning.org/SuperSONIC/configuration-guide.html "Configuration guide").

The full list of configuration parameters is available in the [Configuration reference](http://fastmachinelearning.org/SuperSONIC/configuration-reference.html "Configuration reference").

## Server diagram

Expand All @@ -76,6 +194,7 @@ The full list of configuration parameters is available in the [Configuration ref
| **[Purdue Anvil](https://www.rcac.purdue.edu/compute/anvil)** | ✅ | - | - |
| **[NRP Nautilus](https://docs.nationalresearchplatform.org)** | ✅ | ✅ | ✅ |
| **[UChicago](https://af.uchicago.edu/)** | - | ✅ | - |
| **[UW–Madison](https://www.hep.wisc.edu/cms/comp/)** | ⏳ | - | - |

## Publications

Expand Down
File renamed without changes.
File renamed without changes.
139 changes: 129 additions & 10 deletions helm/supersonic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,24 +27,145 @@ Currently, SuperSONIC supports the following functionality:

## Installation

**Pre-requisites:**
- a Kubernetes cluster with access to GPUs
- a Prometheus instance installed on the cluster, or Prometheus CRDs to deploy your own instance
- KEDA CRDs installed on the cluster (only if using autoscaling)
### Pre-requisites

<details>
<summary><strong>Kubernetes cluster</strong></summary>

ideally with access to GPUs, but CPUs are enough for a minimal deployment.
</details>

<details>
<summary><strong>Helm</strong></summary>

Helm is a package manager for Kubernetes.
To install Helm on your machine, follow the official instructions at [https://helm.sh/docs/intro/install/](https://helm.sh/docs/intro/install/).
</details>

<details>
<summary><strong>Custom Resource Definitions (CRDs) – not needed for minimal deployment</strong></summary>

- [Prometheus](https://prometheus.io) CRDs

If you are using an established Kubernetes cluster (e.g. at an HPC), there is a high chance that these CRDs are already installed. Otherwise, cluster admin can use the following commands:
<details>
<summary><strong>How to install Prometheus CRDs</strong></summary>

```
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install prometheus-operator prometheus-community/kube-prometheus-stack --namespace monitoring --set prometheusOperator.createCustomResource=false --set defaultRules.create=false --set alertmanager.enabled=false --set prometheus.enabled=false --set grafana.enabled=false
```
</details>
- [KEDA](https://keda.sh) CRDs (only if using autoscaling)

<details>
<summary><strong>How to install Prometheus CRDs</strong></summary>

```
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
kubectl create namespace keda
helm install keda kedacore/keda --namespace keda
```
</details>
</details>

---

### Standard deployment

If you are installing SuperSONIC for the first time, proceed to the [Minimal deployment](#minimal-deployment) section below.

If you already have a functional `values.yaml` and/or installed SuperSONIC previously, use the following installation commands:

```
helm repo add fastml https://fastmachinelearning.org/SuperSONIC
helm repo update
helm install <release-name> fastml/supersonic -n <namespace> -f <values.yaml>
```

To construct the `values.yaml` file for your application, follow [Configuration guide](http://fastmachinelearning.org/SuperSONIC/configuration-guide.html "Configuration guide").

The full list of configuration parameters is available in the [Configuration reference](http://fastmachinelearning.org/SuperSONIC/configuration-reference.html "Configuration reference").

---

### Minimal deployment

<details>
<summary><strong>1. Install cvmfs-csi plugin to load models from CVMFS</strong></summary>

For an example installation, we will use CMS models loaded from [CVMFS](https://cvmfs.readthedocs.io/en/stable/). SuperSONIC allows other types of model repository, including
an arbitrary Persistent Volume, an NFS volume, or S3 storage.

[cvmfs-csi](https://github.com/cvmfs-contrib/cvmfs-csi) plugin allows to easily mount CVMFS
into a Kubernetes cluster by creating a new storage class. A Persistent Volume created with this
storage class will have CVMFS contents visible inside.

Cluster admin can use the following commands to install `cvmfs-csi`:
```
kubectl create namespace cvmfs-csi
helm install -n cvmfs-csi cvmfs-csi oci://registry.cern.ch/kubernetes/charts/cvmfs-csi --values cvmfs/values-cvmfs-csi.yaml
kubectl apply -f cvmfs/cvmfs-storageclass.yaml -n cvmfs-csi
```
</details>

<details>
<summary><strong>Install the latest released version from the Helm repository</strong></summary>
<summary><strong>2. Install SuperSONIC with minimal configuration</strong></summary>

The minimal deployment will install only a single CPU-based Triton server and an Envoy Proxy.
We will use [`values/values-minimal.yaml`](values/values-minimal.yaml) as our minimal
configuration file.

```
helm repo add fastml https://fastmachinelearning.org/SuperSONIC
helm repo update
helm install <release-name> fastml/supersonic -n <namespace> -f <your-values.yaml>
helm install <release-name> fastml/supersonic -n <namespace> -f values/values-minimal.yaml
```
</details>

<details>
<summary><strong>3. Deploy a test job to run inferences</strong></summary>

To test your SuperSONIC installation, we will create a small [Nvidia Performance Analyzer](https://docs.nvidia.com/deeplearning/triton-inference-server/archives/triton-inference-server-2280/user-guide/docs/user_guide/perf_analyzer.html) job,
which will send a single inference request with random input data to Envoy Proxy endpoint.

1. In `tests/perf-analyzer-job.yaml`, edit the following parameters to match your deployment:

```
metadata:
namespace: <namespace>
```

In `perf_analyzer` command:

```
-u <release-name>.<namespace>.svc.cluster.local:8001
```

2. Submit the job to your Kubernetes cluster:

```
kubectl apply -n <namespace> -f tests/perf-analyzer-job.yaml
```

3. Track job performance and inspect logs:

```
kubectl get pods -l job-name=perf-analyzer-job -n <namespace>
kubectl logs <pod-name> -n <namespace>
```

</details>

---

### Installing from a GitHub branch/tag/commit

<details>
<summary><strong>Install directly from a GitHub branch/tag/commit</strong></summary>
<summary><strong>This option may be useful for testing unreleased features.</strong></summary>

```
git clone https://github.com/fastmachinelearning/SuperSONIC.git
Expand All @@ -56,9 +177,6 @@ helm install <release-name> helm/supersonic -n <namespace> -f <your-values.yaml>

</details>

To construct the `values.yaml` file for your application, follow [Configuration guide](http://fastmachinelearning.org/SuperSONIC/configuration-guide.html "Configuration guide").

The full list of configuration parameters is available in the [Configuration reference](http://fastmachinelearning.org/SuperSONIC/configuration-reference.html "Configuration reference").

## Server diagram

Expand All @@ -76,6 +194,7 @@ The full list of configuration parameters is available in the [Configuration ref
| **[Purdue Anvil](https://www.rcac.purdue.edu/compute/anvil)** | ✅ | - | - |
| **[NRP Nautilus](https://docs.nationalresearchplatform.org)** | ✅ | ✅ | ✅ |
| **[UChicago](https://af.uchicago.edu/)** | - | ✅ | - |
| **[UW–Madison](https://www.hep.wisc.edu/cms/comp/)** | ⏳ | - | - |

## Publications

Expand Down
File renamed without changes.
Loading