Skip to content
This repository was archived by the owner on Jul 24, 2025. It is now read-only.

Conversation

@jgchn
Copy link
Collaborator

@jgchn jgchn commented May 21, 2025

Fixes #175 by allowing users to specify an OCI URI and creating an read-only image volume.

The ImageVolume feature gate is required for testing the OCI MSVC. Included instructions here.

@jgchn jgchn requested review from asm582, kalantar and sriumcp May 21, 2025 19:52
@mrunalp
Copy link

mrunalp commented May 21, 2025

@haircommander fyi

@danielezonca
Copy link

There are different "flavors" of OCI so the simple oci:// scheme might be misleading:

  • OCI Artifact: where the model is saved as files but it has limited adoption and it is hard to predict how fast (if?) it will be adopted. For example K8s VolumeSource feature seems to be limited to opinionated OCI where each layer must be compressed while OCI Artifact spec doesn't have this constraint)
  • OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource
  • OCI KServe ModelCar: where the model is copied to a base image that can be executed and it is loaded as sidecar container to the runtime. In this case there is a symlink to make the model available to the runtime. It doesn't require any special support from K8s so it works with every version of it.

In KServe the support has been implemented before the K8s OCI VolumeSource development with "ModelCar" approach but that means that oci:// in KServe implies to be a ModelCar (FYI @rhuss) and it is not ideal.
There is a ticket to propose some change to make the "type" of OCI explicit

You might want to consider the same idea in the design/implementation of OCI support in llm-d.

Other note: there is the model-spec project to standardize the layout of a OCI Artifact that stores AI Models, it should be quite transparent to the runtime implementation as far as I see but there are many useful information in the spec config that can be used for validation / auto configuration (FYI @tarilabs)

@jgchn
Copy link
Collaborator Author

jgchn commented May 22, 2025

Hi @danielezonca thanks for the feedback and clarification! It sounds like the user needs to understand how OCI image is built with the model files from the first two bullet points. If the model-spec project from CNAI is the CNCF recommended way of composing AI model artifacts, then we should probably adhere our oci:// scheme to their format. From this diagram, it seems like the /path/to/model substring in oci://<image-with-tag>::/path/to/model can be eliminated in the current PR's implementation, as the format seems to be what vLLM expects. it Do you know if ImageVolume is supported for artifacts composed with model-spec format? Their docs say "Once the model artifact is stored in an OCI registry, the container runtime (e.g., containerd, CRI-O) can pull it from the OCI registry and mount it as a read-only volume during the model serving process, if required," which makes me think that it is in fact supported with ImageVolume.

Regarding the third bullet point, I think K8s support for ImageVolume is exactly why we eliminated having to consider Kserve's ModelCar approach. I think we'd want to study and compare Kserve's ModelCar approach vs. the native K8s ImageVolume. Does the symlink implementation imply that loading the models will be faster?

cc @sriumcp

Copy link
Contributor

@asm582 asm582 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address review

Signed-off-by: Jing Chen <[email protected]>
@jgchn jgchn requested a review from asm582 May 22, 2025 15:14
@asm582
Copy link
Contributor

asm582 commented May 22, 2025

There are different "flavors" of OCI so the simple oci:// scheme might be misleading:

  • OCI Artifact: where the model is saved as files but it has limited adoption and it is hard to predict how fast (if?) it will be adopted. For example K8s VolumeSource feature seems to be limited to opinionated OCI where each layer must be compressed while OCI Artifact spec doesn't have this constraint)
  • OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource
  • OCI KServe ModelCar: where the model is copied to a base image that can be executed and it is loaded as sidecar container to the runtime. In this case there is a symlink to make the model available to the runtime. It doesn't require any special support from K8s so it works with every version of it.

In KServe the support has been implemented before the K8s OCI VolumeSource development with "ModelCar" approach but that means that oci:// in KServe implies to be a ModelCar (FYI @rhuss) and it is not ideal. There is a ticket to propose some change to make the "type" of OCI explicit

You might want to consider the same idea in the design/implementation of OCI support in llm-d.

Other note: there is the model-spec project to standardize the layout of a OCI Artifact that stores AI Models, it should be quite transparent to the runtime implementation as far as I see but there are many useful information in the spec config that can be used for validation / auto configuration (FYI @tarilabs)

Thanks for all the pointers, let us evaluate them and come back

Signed-off-by: Jing Chen <[email protected]>
@danielezonca
Copy link

If the model-spec project from CNAI is the CNCF recommended way of composing AI model artifacts, then we should probably adhere our oci:// scheme to their format.

It is not yet a CNCF project and there is an ongoing voting to become a CNCF sandbox project but it is already part of CNAI

Do you know if ImageVolume is supported for artifacts composed with model-spec format? Their docs say "Once the model artifact is stored in an OCI registry, the container runtime (e.g., containerd, CRI-O) can pull it from the OCI registry and mount it as a read-only volume during the model serving process, if required," which makes me think that it is in fact supported with ImageVolume.

I would let @tarilabs to comment on this, in general I expect so, this format is more about having a common layout that can be used to introspect the artifact more easily but it remains a valid OCI container.
The main limitation that I'm aware of from the runtime PoV (containerd/CRI-O) is the support of OCI Image/Artifact only if the layer is compressed (tar) while OCI Artifact spec doesn't require to compress files to store them.
Said differently not every valid OCI Artifact might be supported by the runtime.

Regarding the third bullet point, I think K8s support for ImageVolume is exactly why we eliminated having to consider Kserve's ModelCar approach. I think we'd want to study and compare Kserve's ModelCar approach vs. the native K8s ImageVolume. Does the symlink implementation imply that loading the models will be faster?

The image is pulled by K8s and mounted as symlink so the model is never copied. I personally don't expect this to be faster than mounting the OCI as Volume but I would like to get @rhuss to comment given that he might have more insights.
The main benefit of KServe ModelCar is to support all K8s versions without the need to enable an experimental feature but yes in the long run I expect the ImageVolume to be the preferred/only option but I don't know how long it will take OCI VolumeSource support to graduate.

@tarilabs
Copy link

The main limitation that I'm aware of from the runtime PoV (containerd/CRI-O) is the support of OCI Image/Artifact only if the layer is compressed (tar) while OCI Artifact spec doesn't require to compress files to store them.
Said differently not every valid OCI Artifact might be supported by the runtime.

☝️ that is indeed the gotcha, the spec at K8s level is wide, but as far as I can see for Container Runtimes the imagevolume is limited to artifact/image having tar/targz layers. Said differently, not raw blobs.

@jgchn jgchn mentioned this pull request May 22, 2025
@jgchn
Copy link
Collaborator Author

jgchn commented May 22, 2025

Thanks for all your insights @danielezonca and @tarilabs . Definitely don't want to lose them once this PR is merged. Created this issue to track if you could all post future discussions regarding OCI there instead.

@sriumcp
Copy link
Collaborator

sriumcp commented May 23, 2025

OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource

@danielezonca would renaming the oci:// implementation in this PR to ocivol:// make more sense? I believe this is essentially the use-case targeted by this PR (@jgchn correct me if I got this wrong).

Re: ModelCar, AFAIK, it is already supported today in ModelService, since we enable sidecars along with ephemeral volume mounting (which I guess can be used for symlinking). Of course, this (and every other ModelService feature) needs to be documented (WIP).

@danielezonca
Copy link

OCI Image "from scratch": where the model is copied in a OCI Image that has no base image, this is supported as VolumeSource

@danielezonca would renaming the oci:// implementation in this PR to ocivol:// make more sense? I believe this is essentially the use-case targeted by this PR (@jgchn correct me if I got this wrong).

In general my main comment is, given that there are multiple approaches to adopt OCI for AI models, to make this clear in the API to avoid misleading behavior with the same mistake has been done in KServe OCI supports that in reality means (at least for now) "ModelCar" format.
As far as I understand the goal of this PR is to support K8s VolumeSource idea so I would try to make this explicit.

About the name of the protocol to use, personally I like the proposal described in the ticket I have linked before

and use + and specify the "type" so in this case it could be oci+volume (or oci+native as suggested in the linked ticket) to represent the K8s volume behavior.

Re: ModelCar, AFAIK, it is already supported today in ModelService, since we enable sidecars along with ephemeral volume mounting (which I guess can be used for symlinking). Of course, this (and every other ModelService feature) needs to be documented (WIP).

Yes, if you can add a sidecar and configure the ModelCar as sidecar it might work, the only additional logic that has been implemented in KServe is a prepull step as initContainer to avoid a race condition between vLLM container and the ModelCar (aka the runtime starts but the image with the model has been still being pulled).

Signed-off-by: Jing Chen <[email protected]>
@rhuss
Copy link

rhuss commented Jun 5, 2025

sorry, for being late to the game.

Regarding the third bullet point, I think K8s support for ImageVolume is exactly why we eliminated having to consider Kserve's ModelCar approach. I think we'd want to study and compare Kserve's ModelCar approach vs. the native K8s ImageVolume. Does the symlink implementation imply that loading the models will be faster?

The symlink approach is not faster, it's just allows to let the runtime container find the container with the model more easily, i.e. under a fixed /model path. In both approaches the model data is directly served from the container runtime without the need of a resource intensive copy from one dir into another.


I really would love to come kserve/kserve#4083 to become alife, with this schema:

Then for the storageUri, the proposal would be to extend the oci:// schema with some mode added with + (like in webdav+ssl or github+ssh):

  • oci+modelcar://.... - Use modelcar to fetch the model
  • oci+native://... - Leverage native Kubernetes support for mounting the OCI image
  • oci+fetch://.... - Download the image programmatically and store it in an emptyDir volume locally

oci:// could then be an alias for any of those, as configured in the system configuration (such as in the KServe config). That way, the user can rely on the system default with oci:// but is also able to specialize with the more specific URL schemas.

(oci+native could be also oci+volume which I probably prefer now)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement behaviors for OCI URI

8 participants