Skip to content

Conversation

@leddits
Copy link

@leddits leddits commented Apr 18, 2025

Fix: Avoid CDI device injection error by reverting to legacy GPU device specification

Description: This update addresses a runtime error related to unresolved CDI devices (nvidia.com/gpu=all) during container launch on systems where CDI support is not properly configured or enabled.

Reason for change: Some environments, especially Jetson-based platforms or hosts with older versions of the NVIDIA Container Toolkit, do not support the CDI device naming convention (nvidia.com/gpu=all). This leads to a fatal error during container startup:

OCI runtime create failed: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown

Changes made:

Replaced NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all,nvidia.com/pva=all with the more broadly compatible NVIDIA_VISIBLE_DEVICES=all on ARM64 (Jetson).

This ensures compatibility with both CDI-enabled and legacy NVIDIA container runtimes.

Impact: Improves container launch stability across a wider range of Jetson and desktop environments by falling back to the standard GPU device exposure mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant