diff --git a/docs/clearml_agent/clearml_agent_nvcr.md b/docs/clearml_agent/clearml_agent_nvcr.md new file mode 100644 index 00000000..3bdafbdb --- /dev/null +++ b/docs/clearml_agent/clearml_agent_nvcr.md @@ -0,0 +1,43 @@ +--- +title: NVCR Access +--- + +To allow ClearML Agents to access NVIDIA's container registry (`nvcr.io`), the machine’s Docker infrastructure must first be configured with valid NGC credentials. +This enables Agents to pull NVIDIA-provided containers, such as those used by the [NVIDIA NIM app](../webapp/applications/apps_nvidia_nim.md). The setup is +required once per worker node, not every time you run an app. + +Configure Docker access to the `nvcr` repository on [bare-metal/VM](#on-bare-metal--vm) or [Kubernetes](#on-kubernetes). + +## On Bare Metal / VM + +Execute the following command where the agent that will execute the app instance will be running (replace the password with a valid NGC API key): + +``` +docker login nvcr.io --username '$oauthtoken' --password 'nvapi-**' +``` +Password is provided with your `nvcr` account. + +## On Kubernetes + +To make `nvcr` available to agents running on Kubernetes: +* Create an `nvcr-registry` secret in the same namespace where the agent is running. Replace: + * `` with the namespace where your ClearML Agent is deployed + * `` with your NVIDIA registry username + * `` with your valid NGC API key

+ + ``` + kubectl create secret docker-registry nvcr-registry -n \ + --docker-server=nvcr.io \ + --docker-username= \ + --docker-password= \ + --docker-email="" + ``` + +* Configure image pull secrets for the NVIDIA registry. + In your Agent Helm values override, add: + + ``` + imageCredentials: + extraImagePullSecrets: + - name: nvcr-registry + ``` diff --git a/docs/webapp/applications/apps_nvidia_nim.md b/docs/webapp/applications/apps_nvidia_nim.md index 8fb0e928..796f198d 100644 --- a/docs/webapp/applications/apps_nvidia_nim.md +++ b/docs/webapp/applications/apps_nvidia_nim.md @@ -11,10 +11,9 @@ serves your model on a machine of your choice. Once an app instance is running, publicly accessible network endpoint. The app monitors endpoint activity and shuts down if the model remains inactive for a specified maximum idle time. -Note that the `NGC_API_KEY` environment variable needs to be set to a valid NGC API key. You can set the variable in one -of the following ways: -* The NIM app deployment form’s `Environment Variables` field -* [Configuration vault](../settings/webapp_settings_profile.md#configuration-vault) +* The `NGC_API_KEY` environment variable needs to be set to a valid NGC API key. You can set the variable in one of the following ways: + * The NIM app deployment form’s `Environment Variables` field + * [Configuration vault](../settings/webapp_settings_profile.md#configuration-vault) :::info AI Application Gateway The NIM app makes use of the App Gateway Router which implements a secure, authenticated network endpoint for the model. @@ -81,7 +80,8 @@ values from the file, which can be modified before launching the app instance * **Application instance project**: The ClearML project where the app instance is created. Access is determined by project-level permissions (i.e. users with read access can use the app). * **NIM Container Image**: Select the containerized application image to use. Note the different tags / versions of each image -* **Compute Resource (Queue)**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the NIM app instance task will be enqueued (make sure an agent is assigned to it) +* **Compute Resource (Queue)**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the NIM app instance task will be enqueued. Make sure an agent + is assigned to this queue and has access to NVIDIA's container registry (`nvcr.io`). See [NVCR Access](../../clearml_agent/clearml_agent_nvcr.md) for more information. * **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created. * **Idle Time Limit** (Hours): Maximum idle time after which the app instance will shut down * **Environment Variables**: Additional environment variable to set inside the container before launching the application diff --git a/sidebars.js b/sidebars.js index a26f9f67..723430fd 100644 --- a/sidebars.js +++ b/sidebars.js @@ -668,6 +668,7 @@ module.exports = { ] }, 'clearml_agent/multi_node_training', + 'clearml_agent/clearml_agent_nvcr', { type: 'category', collapsible: true,