Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions docs/clearml_agent/clearml_agent_nvcr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: NVCR Access
---

To allow ClearML Agents to access NVIDIA's container registry (`nvcr.io`), the machine’s Docker infrastructure must first be configured with valid NGC credentials.
This enables Agents to pull NVIDIA-provided containers, such as those used by the [NVIDIA NIM app](../webapp/applications/apps_nvidia_nim.md). The setup is
required once per worker node, not every time you run an app.

Configure Docker access to the `nvcr` repository on [bare-metal/VM](#on-bare-metal--vm) or [Kubernetes](#on-kubernetes).

## On Bare Metal / VM

Execute the following command where the agent that will execute the app instance will be running (replace the password with a valid NGC API key):

```
docker login nvcr.io --username '$oauthtoken' --password 'nvapi-**'
```
Password is provided with your `nvcr` account.

## On Kubernetes

To make `nvcr` available to agents running on Kubernetes:
* Create an `nvcr-registry` secret in the same namespace where the agent is running. Replace:
* `<NAMESPACE>` with the namespace where your ClearML Agent is deployed
* `<USERNAME>` with your NVIDIA registry username
* `<PASSWORD>` with your valid NGC API key <br/><br/>

```
kubectl create secret docker-registry nvcr-registry -n <NAMESPACE> \
--docker-server=nvcr.io \
--docker-username=<USERNAME> \
--docker-password=<PASSWORD> \
--docker-email=""
```

* Configure image pull secrets for the NVIDIA registry.
In your Agent Helm values override, add:

```
imageCredentials:
extraImagePullSecrets:
- name: nvcr-registry
```
10 changes: 5 additions & 5 deletions docs/webapp/applications/apps_nvidia_nim.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,9 @@ serves your model on a machine of your choice. Once an app instance is running,
publicly accessible network endpoint. The app monitors endpoint activity and shuts down if the model remains inactive
for a specified maximum idle time.

Note that the `NGC_API_KEY` environment variable needs to be set to a valid NGC API key. You can set the variable in one
of the following ways:
* The NIM app deployment form’s `Environment Variables` field
* [Configuration vault](../settings/webapp_settings_profile.md#configuration-vault)
* The `NGC_API_KEY` environment variable needs to be set to a valid NGC API key. You can set the variable in one of the following ways:
* The NIM app deployment form’s `Environment Variables` field
* [Configuration vault](../settings/webapp_settings_profile.md#configuration-vault)

:::info AI Application Gateway
The NIM app makes use of the App Gateway Router which implements a secure, authenticated network endpoint for the model.
Expand Down Expand Up @@ -81,7 +80,8 @@ values from the file, which can be modified before launching the app instance
* **Application instance project**: The ClearML project where the app instance is created. Access is determined by
project-level permissions (i.e. users with read access can use the app).
* **NIM Container Image**: Select the containerized application image to use. Note the different tags / versions of each image
* **Compute Resource (Queue)**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the NIM app instance task will be enqueued (make sure an agent is assigned to it)
* **Compute Resource (Queue)**: The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which the NIM app instance task will be enqueued. Make sure an agent
is assigned to this queue and has access to NVIDIA's container registry (`nvcr.io`). See [NVCR Access](../../clearml_agent/clearml_agent_nvcr.md) for more information.
* **AI Gateway Route**: Select an available, admin-preconfigured route to use as the service endpoint. If none is selected, an ephemeral endpoint will be created.
* **Idle Time Limit** (Hours): Maximum idle time after which the app instance will shut down
* **Environment Variables**: Additional environment variable to set inside the container before launching the application
Expand Down
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,7 @@ module.exports = {
]
},
'clearml_agent/multi_node_training',
'clearml_agent/clearml_agent_nvcr',
{
type: 'category',
collapsible: true,
Expand Down