contenox/runtime is an open-source runtime for orchestrating generative AI workflows. It treats AI workflows as state machines, enabling:
✅ Declarative workflow definition ✅ Built-in state management ✅ Vendor-agnostic execution ✅ Multi-backend orchestration ✅ Observability with passion ✅ Made with Go for intensive load ✅ Build agentic capabilities via hooks ✅ Drop-in for OpenAI chatcompletion API
This single command will start all necessary services, configure the backend, and download the initial models.
- Docker and Docker Compose
curl
andjq
# Clone the repository
git clone https://github.com/contenox/runtime.git
cd runtime
# Configure the systems fallback models
export EMBED_MODEL=nomic-embed-text:latest
export EMBED_PROVIDER=ollama
export EMBED_MODEL_CONTEXT_LENGTH=2048
export TASK_MODEL=phi3:3.8b
export TASK_MODEL_CONTEXT_LENGTH=2048
export TASK_PROVIDER=ollama
export CHAT_MODEL=phi3:3.8b
export CHAT_MODEL_CONTEXT_LENGTH=2048
export CHAT_PROVIDER=ollama
export OLLAMA_BACKEND_URL="http://ollama:11434"
# or any other like: export OLLAMA_BACKEND_URL="http://host.docker.internal:11434"
# to use OLLAMA_BACKEND_URL with host.docker.internal
# remember sudo systemctl edit ollama.service -> Environment="OLLAMA_HOST=172.17.0.1" or 0.0.0.0
# Start the container services
echo "Starting services with 'docker compose up -d'..."
docker compose up -d
echo "Services are starting up."
# Configure the runtime with your model preferences
# the bootstraping script works only for ollama models/backends
# for to use other providers refer to the API-Spec.
./scripts/bootstrap.sh $EMBED_MODEL $TASK_MODEL $CHAT_MODEL
# setup a demo OpenAI chat-completion and model endpoint
./scripts/openai-demo.sh $CHAT_MODEL demo
# this will setup the following endpoints:
# - http://localhost:8081/openai/demo/v1/chat/completions
# - http://localhost:8081/openai/demo/v1/models
Once the script finishes, the environment is fully configured and ready to use.
After the bootstrap is complete, test the setup by executing a simple prompt:
curl -X POST http://localhost:8081/execute \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain quantum computing in simple terms"}'
Save the following as qa.json
:
{
"input": "What's the best way to optimize database queries?",
"inputType": "string",
"chain": {
"id": "smart-query-assistant",
"description": "Handles technical questions",
"tasks": [
{
"id": "generate_response",
"description": "Generate final answer",
"handler": "raw_string",
"systemInstruction": "You're a senior engineer. Provide concise, professional answers to technical questions.",
"transition": {
"branches": [
{ "operator": "default", "goto": "end" }
]
}
}
]
}
}
Execute the workflow:
curl -X POST http://localhost:8081/tasks \
-H "Content-Type: application/json" \
-d @qa.json
All runtime activity is captured in structured logs:
docker logs contenox-runtime-kernel
- Conditional Branching: Route execution based on LLM outputs
- Built-in Handlers:
condition_key
: Validate and route responsesparse_number
: Extract numerical valuesparse_range
: Handle score rangesraw_string
: Standard text generationembedding
: Embedding generationmodel_execution
: Model execution on a chat historyhook
: Calls a user-defined hook pointing to an external service
- Context Preservation: Automatic input/output passing between steps
- Multi-Model Support: Define preferred models for each task chain
- Retry and Timeout: Configure task-level retries and timeouts for robust workflows
Define preferred model provider and backend resolution policy directly within task chains. This allows for seamless, dynamic orchestration across various LLM providers.
graph TD
subgraph "User Space"
U[User / Client Application]
end
subgraph "contenox/runtime"
API[API Layer]
OE["Orchestration Engine <br/> Task Execution <br/> & State Management"]
CONN["Connectors <br/> Model Resolver <br/> & Hook Client"]
end
subgraph "External Services"
LLM[LLM Backends <br/> Ollama, OpenAI, vLLM, etc.]
HOOK[External Tools and APIs <br/> Custom Hooks]
end
%% --- Data Flow ---
U -- API Requests --> API
API -- Triggers Task Chain --> OE
OE -- Executes via --> CONN
CONN -- Routes to LLMs --> LLM
CONN -- Calls External Hooks --> HOOK
LLM -- LLM Responses --> CONN
HOOK -- Hook Responses --> CONN
CONN -- Results --> OE
OE -- Returns Final Output --> API
API -- API Responses --> U
- Unified Interface: Consistent API across providers
- Automatic Sync: Models stay consistent across backends
- Affinity Group Management: Map models to backends for performance tiering and routing strategies
- Backend Resolver: Distribute requests to backends based on resolution policies
Hooks are external servers that can be called from within task chains when registered. They allow interaction with systems and data outside of the runtime and task chains themselves. 🔗 See Hook Documentation
The full API surface is thoroughly documented and defined in the OpenAPI format, making it easy to integrate with other tools. You can find more details here:
The API-Tests are available for additional context.