Skip to content

Support for vLLM sleep and wakeup endpoints in simulation mode #218

@aavarghese

Description

@aavarghese

What would you like to be added:
Support for vLLM sleep and wakeup (https://docs.vllm.ai/en/latest/features/sleep_mode.html#usage) endpoints. Today is stable and enabled in DEV mode VLLM_SERVER_DEV_MODE=1

Why is this needed:
To run inference servers as part of our solution and put them to sleep when inactive and no incoming requests. Waking-up vllm is much quicker than starting new vllm instances (avoids cold start).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions