-
Notifications
You must be signed in to change notification settings - Fork 168
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Version
latest
Operating System
Linux
Python Version
3.12
What happened?
Running a local vLLM server, configuring it to be used with synthetic-data-kit via config, and configuring the generation.batch_size
key in config should send batch requests to the local vLLM server.
But the vLLM server logs show that requests are sent sequentially.
Relevant log output
(APIServer pid=1) INFO 09-05 08:52:54 [loggers.py:123] Engine 000: Avg prompt throughput: 1.4 tokens/s, Avg generation throughput: 20.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 09-05 08:53:04 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.3%, Prefix cache hit rate: 0.0%
Steps to reproduce
- Run a local vLLM server
- Set the
generation.batch_size
orcurate.batch_size
key to 16 or 32 - Run synthetic-data-kit generation, curation etc
- check vLLM server logs to see requests in flight - should be 16 or 32 - is 1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working