Implement OCI behavior #204

jgchn · 2025-05-21T17:59:59Z

Fixes #175 by allowing users to specify an OCI URI and creating an read-only image volume.

The ImageVolume feature gate is required for testing the OCI MSVC. Included instructions here.

Signed-off-by: Jing Chen <[email protected]>

mrunalp · 2025-05-21T23:11:34Z

@haircommander fyi

internal/controller/utils.go

danielezonca · 2025-05-22T12:27:36Z

There are different "flavors" of OCI so the simple oci:// scheme might be misleading:

OCI Artifact: where the model is saved as files but it has limited adoption and it is hard to predict how fast (if?) it will be adopted. For example K8s VolumeSource feature seems to be limited to opinionated OCI where each layer must be compressed while OCI Artifact spec doesn't have this constraint)
OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource
OCI KServe ModelCar: where the model is copied to a base image that can be executed and it is loaded as sidecar container to the runtime. In this case there is a symlink to make the model available to the runtime. It doesn't require any special support from K8s so it works with every version of it.

In KServe the support has been implemented before the K8s OCI VolumeSource development with "ModelCar" approach but that means that oci:// in KServe implies to be a ModelCar (FYI @rhuss) and it is not ideal.
There is a ticket to propose some change to make the "type" of OCI explicit

Harmonizing OCI Image model support kserve/kserve#4083

You might want to consider the same idea in the design/implementation of OCI support in llm-d.

Other note: there is the model-spec project to standardize the layout of a OCI Artifact that stores AI Models, it should be quite transparent to the runtime implementation as far as I see but there are many useful information in the spec config that can be used for validation / auto configuration (FYI @tarilabs)

jgchn · 2025-05-22T14:33:59Z

Hi @danielezonca thanks for the feedback and clarification! It sounds like the user needs to understand how OCI image is built with the model files from the first two bullet points. If the model-spec project from CNAI is the CNCF recommended way of composing AI model artifacts, then we should probably adhere our oci:// scheme to their format. From this diagram, it seems like the /path/to/model substring in oci://<image-with-tag>::/path/to/model can be eliminated in the current PR's implementation, as the format seems to be what vLLM expects. it Do you know if ImageVolume is supported for artifacts composed with model-spec format? Their docs say "Once the model artifact is stored in an OCI registry, the container runtime (e.g., containerd, CRI-O) can pull it from the OCI registry and mount it as a read-only volume during the model serving process, if required," which makes me think that it is in fact supported with ImageVolume.

Regarding the third bullet point, I think K8s support for ImageVolume is exactly why we eliminated having to consider Kserve's ModelCar approach. I think we'd want to study and compare Kserve's ModelCar approach vs. the native K8s ImageVolume. Does the symlink implementation imply that loading the models will be faster?

cc @sriumcp

samples/test/README.md

samples/test/kind-config.yaml

internal/controller/utils.go

asm582

address review

Signed-off-by: Jing Chen <[email protected]>

asm582 · 2025-05-22T15:18:01Z

There are different "flavors" of OCI so the simple oci:// scheme might be misleading:

OCI Artifact: where the model is saved as files but it has limited adoption and it is hard to predict how fast (if?) it will be adopted. For example K8s VolumeSource feature seems to be limited to opinionated OCI where each layer must be compressed while OCI Artifact spec doesn't have this constraint)

OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource

OCI KServe ModelCar: where the model is copied to a base image that can be executed and it is loaded as sidecar container to the runtime. In this case there is a symlink to make the model available to the runtime. It doesn't require any special support from K8s so it works with every version of it.

In KServe the support has been implemented before the K8s OCI VolumeSource development with "ModelCar" approach but that means that oci:// in KServe implies to be a ModelCar (FYI @rhuss) and it is not ideal. There is a ticket to propose some change to make the "type" of OCI explicit

Harmonizing OCI Image model support kserve/kserve#4083

You might want to consider the same idea in the design/implementation of OCI support in llm-d.

Other note: there is the model-spec project to standardize the layout of a OCI Artifact that stores AI Models, it should be quite transparent to the runtime implementation as far as I see but there are many useful information in the spec config that can be used for validation / auto configuration (FYI @tarilabs)

Thanks for all the pointers, let us evaluate them and come back

Signed-off-by: Jing Chen <[email protected]>

danielezonca · 2025-05-22T15:46:31Z

If the model-spec project from CNAI is the CNCF recommended way of composing AI model artifacts, then we should probably adhere our oci:// scheme to their format.

It is not yet a CNCF project and there is an ongoing voting to become a CNCF sandbox project but it is already part of CNAI

[Sandbox] ModelPack cncf/sandbox#358

Do you know if ImageVolume is supported for artifacts composed with model-spec format? Their docs say "Once the model artifact is stored in an OCI registry, the container runtime (e.g., containerd, CRI-O) can pull it from the OCI registry and mount it as a read-only volume during the model serving process, if required," which makes me think that it is in fact supported with ImageVolume.

I would let @tarilabs to comment on this, in general I expect so, this format is more about having a common layout that can be used to introspect the artifact more easily but it remains a valid OCI container.
The main limitation that I'm aware of from the runtime PoV (containerd/CRI-O) is the support of OCI Image/Artifact only if the layer is compressed (tar) while OCI Artifact spec doesn't require to compress files to store them.
Said differently not every valid OCI Artifact might be supported by the runtime.

Regarding the third bullet point, I think K8s support for ImageVolume is exactly why we eliminated having to consider Kserve's ModelCar approach. I think we'd want to study and compare Kserve's ModelCar approach vs. the native K8s ImageVolume. Does the symlink implementation imply that loading the models will be faster?

The image is pulled by K8s and mounted as symlink so the model is never copied. I personally don't expect this to be faster than mounting the OCI as Volume but I would like to get @rhuss to comment given that he might have more insights.
The main benefit of KServe ModelCar is to support all K8s versions without the need to enable an experimental feature but yes in the long run I expect the ImageVolume to be the preferred/only option but I don't know how long it will take OCI VolumeSource support to graduate.

tarilabs · 2025-05-22T15:57:22Z

The main limitation that I'm aware of from the runtime PoV (containerd/CRI-O) is the support of OCI Image/Artifact only if the layer is compressed (tar) while OCI Artifact spec doesn't require to compress files to store them.
Said differently not every valid OCI Artifact might be supported by the runtime.

☝️ that is indeed the gotcha, the spec at K8s level is wide, but as far as I can see for Container Runtimes the imagevolume is limited to artifact/image having tar/targz layers. Said differently, not raw blobs.

jgchn · 2025-05-22T18:59:32Z

Thanks for all your insights @danielezonca and @tarilabs . Definitely don't want to lose them once this PR is merged. Created this issue to track if you could all post future discussions regarding OCI there instead.

Signed-off-by: Jing Chen <[email protected]>

sriumcp · 2025-05-23T00:03:13Z

OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource

@danielezonca would renaming the oci:// implementation in this PR to ocivol:// make more sense? I believe this is essentially the use-case targeted by this PR (@jgchn correct me if I got this wrong).

Re: ModelCar, AFAIK, it is already supported today in ModelService, since we enable sidecars along with ephemeral volume mounting (which I guess can be used for symlinking). Of course, this (and every other ModelService feature) needs to be documented (WIP).

Signed-off-by: Jing Chen <[email protected]>

danielezonca · 2025-05-26T12:50:40Z

OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource

@danielezonca would renaming the oci:// implementation in this PR to ocivol:// make more sense? I believe this is essentially the use-case targeted by this PR (@jgchn correct me if I got this wrong).

In general my main comment is, given that there are multiple approaches to adopt OCI for AI models, to make this clear in the API to avoid misleading behavior with the same mistake has been done in KServe OCI supports that in reality means (at least for now) "ModelCar" format.
As far as I understand the goal of this PR is to support K8s VolumeSource idea so I would try to make this explicit.

About the name of the protocol to use, personally I like the proposal described in the ticket I have linked before

Harmonizing OCI Image model support kserve/kserve#4083

and use + and specify the "type" so in this case it could be oci+volume (or oci+native as suggested in the linked ticket) to represent the K8s volume behavior.

Re: ModelCar, AFAIK, it is already supported today in ModelService, since we enable sidecars along with ephemeral volume mounting (which I guess can be used for symlinking). Of course, this (and every other ModelService feature) needs to be documented (WIP).

Yes, if you can add a sidecar and configure the ModelCar as sidecar it might work, the only additional logic that has been implemented in KServe is a prepull step as initContainer to avoid a race condition between vLLM container and the ModelCar (aka the runtime starts but the image with the model has been still being pulled).

Signed-off-by: Jing Chen <[email protected]>

rhuss · 2025-06-05T13:13:34Z

sorry, for being late to the game.

Regarding the third bullet point, I think K8s support for ImageVolume is exactly why we eliminated having to consider Kserve's ModelCar approach. I think we'd want to study and compare Kserve's ModelCar approach vs. the native K8s ImageVolume. Does the symlink implementation imply that loading the models will be faster?

The symlink approach is not faster, it's just allows to let the runtime container find the container with the model more easily, i.e. under a fixed /model path. In both approaches the model data is directly served from the container runtime without the need of a resource intensive copy from one dir into another.

I really would love to come kserve/kserve#4083 to become alife, with this schema:

Then for the storageUri, the proposal would be to extend the oci:// schema with some mode added with + (like in webdav+ssl or github+ssh):

oci+modelcar://.... - Use modelcar to fetch the model

oci+native://... - Leverage native Kubernetes support for mounting the OCI image

oci+fetch://.... - Download the image programmatically and store it in an emptyDir volume locally

oci:// could then be an alias for any of those, as configured in the system configuration (such as in the KServe config). That way, the user can rely on the system default with oci:// but is also able to specialize with the more specific URL schemas.

(oci+native could be also oci+volume which I probably prefer now)

jgchn added 3 commits May 21, 2025 13:57

Implement OCI behavior

3089a22

Signed-off-by: Jing Chen <[email protected]>

Add instructions for feature gate

d0a2a98

Signed-off-by: Jing Chen <[email protected]>

Enable feature gate in OCI test

f978e44

Signed-off-by: Jing Chen <[email protected]>

jgchn requested review from asm582, kalantar and sriumcp May 21, 2025 19:52

rphillips reviewed May 22, 2025

View reviewed changes

internal/controller/utils.go Outdated Show resolved Hide resolved

asm582 reviewed May 22, 2025

View reviewed changes

samples/test/README.md Outdated Show resolved Hide resolved

asm582 reviewed May 22, 2025

View reviewed changes

samples/test/kind-config.yaml Show resolved Hide resolved

asm582 reviewed May 22, 2025

View reviewed changes

internal/controller/utils.go Outdated Show resolved Hide resolved

asm582 suggested changes May 22, 2025

View reviewed changes

Address comments

d9efa8f

Signed-off-by: Jing Chen <[email protected]>

jgchn requested a review from asm582 May 22, 2025 15:14

Reverse err check order

add4914

Signed-off-by: Jing Chen <[email protected]>

jgchn mentioned this pull request May 22, 2025

Valid OCI behavior #211

Open

jgchn added 2 commits May 22, 2025 16:24

Merge remote-tracking branch 'upstream/main' into oci-new

2aa2b22

Merge conflicts

1950ea9

Signed-off-by: Jing Chen <[email protected]>

jgchn added 3 commits May 23, 2025 12:06

Merge remote-tracking branch 'upstream/main' into oci-new

ad97e17

Add docs for oci

69ddcc1

Signed-off-by: Jing Chen <[email protected]>

Update api references

012c474

Signed-off-by: Jing Chen <[email protected]>

Make oci url explicit

2478e22

Signed-off-by: Jing Chen <[email protected]>

Implement OCI behavior #204

Are you sure you want to change the base?

Implement OCI behavior #204

Uh oh!

Conversation

jgchn commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrunalp commented May 21, 2025

Uh oh!

Uh oh!

danielezonca commented May 22, 2025

Uh oh!

jgchn commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asm582 left a comment

Choose a reason for hiding this comment

Uh oh!

asm582 commented May 22, 2025

Uh oh!

danielezonca commented May 22, 2025

Uh oh!

tarilabs commented May 22, 2025

Uh oh!

jgchn commented May 22, 2025

Uh oh!

sriumcp commented May 23, 2025

Uh oh!

danielezonca commented May 26, 2025

Uh oh!

rhuss commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jgchn commented May 21, 2025 •

edited

Loading

rhuss commented Jun 5, 2025 •

edited

Loading