ModelMesh Serving is the Controller for managing ModelMesh, a general-purpose model serving management/routing layer.
To quickly get started with ModelMesh Serving, check out the Quick Start Guide.
For help, please open an issue in this repository.
ModelMesh Serving currently comprises components spread over a number of repositories. The supported versions for the latest release are documented here.
Issues across all components are tracked centrally in this repo.
- https://github.com/kserve/modelmesh-serving (this repo) - the model serving controller
- https://github.com/kserve/modelmesh - the ModelMesh containers used for orchestrating model placement and routing
- modelmesh-runtime-adapter - the containers which run in each model serving pod and act as an intermediary between ModelMesh and third-party model-server containers. Its build produces a single "multi-purpose" image which can be used as an adapter to work with each of the out-of-the-box supported model servers. It also incorporates the "puller" logic which is responsible for retrieving the models from storage before handing over to the respective adapter logic to load the model (and to delete after unloading). This image is also used for a container in the load/unload path of custom ServingRuntimePods, as a "standalone" puller.
ModelMesh Serving provides out-of-the-box integration with the following model servers.
- triton-inference-server - Nvidia's Triton Inference Server
- seldon-mlserver - Seldon's Python MLServer
- openVINO-model-server - OpenVINO Model Server
- torchserve - TorchServe
ServingRuntime custom resources can be used to add support for other existing or custom-built model servers, see the docs on implementing a custom Serving Runtime
- KServe V2 REST Proxy - a reverse-proxy server which translates a RESTful HTTP API into gRPC. This allows sending inference requests using the KServe V2 REST Predict Protocol to ModelMesh models which currently only support the V2 gRPC Predict Protocol.
These are helper Java libraries used by the ModelMesh component.
- kv-utils - Useful KV store recipes abstracted over etcd and Zookeeper
- litelinks-core - RPC/service discovery library based on Apache Thrift, used only for communications internal to ModelMesh.
Please read our contributing guide for details on contributing.
# Build develop image
make build.develop
# After building the develop image,  build the runtime image
make build