-
Notifications
You must be signed in to change notification settings - Fork 14
Implement OCI behavior #204
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jing Chen <[email protected]>
Signed-off-by: Jing Chen <[email protected]>
Signed-off-by: Jing Chen <[email protected]>
|
@haircommander fyi |
|
There are different "flavors" of
In KServe the support has been implemented before the K8s OCI VolumeSource development with "ModelCar" approach but that means that You might want to consider the same idea in the design/implementation of OCI support in llm-d. Other note: there is the model-spec project to standardize the layout of a OCI Artifact that stores AI Models, it should be quite transparent to the runtime implementation as far as I see but there are many useful information in the spec config that can be used for validation / auto configuration (FYI @tarilabs) |
|
Hi @danielezonca thanks for the feedback and clarification! It sounds like the user needs to understand how OCI image is built with the model files from the first two bullet points. If the model-spec project from CNAI is the CNCF recommended way of composing AI model artifacts, then we should probably adhere our Regarding the third bullet point, I think K8s support for cc @sriumcp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
address review
Signed-off-by: Jing Chen <[email protected]>
Thanks for all the pointers, let us evaluate them and come back |
Signed-off-by: Jing Chen <[email protected]>
It is not yet a CNCF project and there is an ongoing voting to become a CNCF sandbox project but it is already part of CNAI
I would let @tarilabs to comment on this, in general I expect so, this format is more about having a common layout that can be used to introspect the artifact more easily but it remains a valid OCI container.
The image is pulled by K8s and mounted as symlink so the model is never copied. I personally don't expect this to be faster than mounting the OCI as Volume but I would like to get @rhuss to comment given that he might have more insights. |
☝️ that is indeed the gotcha, the spec at K8s level is wide, but as far as I can see for Container Runtimes the imagevolume is limited to artifact/image having tar/targz layers. Said differently, not raw blobs. |
|
Thanks for all your insights @danielezonca and @tarilabs . Definitely don't want to lose them once this PR is merged. Created this issue to track if you could all post future discussions regarding OCI there instead. |
Signed-off-by: Jing Chen <[email protected]>
@danielezonca would renaming the Re: ModelCar, AFAIK, it is already supported today in |
Signed-off-by: Jing Chen <[email protected]>
Signed-off-by: Jing Chen <[email protected]>
In general my main comment is, given that there are multiple approaches to adopt OCI for AI models, to make this clear in the API to avoid misleading behavior with the same mistake has been done in KServe OCI supports that in reality means (at least for now) "ModelCar" format. About the name of the protocol to use, personally I like the proposal described in the ticket I have linked before and use
Yes, if you can add a sidecar and configure the ModelCar as sidecar it might work, the only additional logic that has been implemented in KServe is a prepull step as initContainer to avoid a race condition between vLLM container and the ModelCar (aka the runtime starts but the image with the model has been still being pulled). |
Signed-off-by: Jing Chen <[email protected]>
|
sorry, for being late to the game.
The symlink approach is not faster, it's just allows to let the runtime container find the container with the model more easily, i.e. under a fixed I really would love to come kserve/kserve#4083 to become alife, with this schema:
( |
Fixes #175 by allowing users to specify an OCI URI and creating an read-only image volume.
The
ImageVolumefeature gate is required for testing the OCI MSVC. Included instructions here.