[Bug]: requests are not truly concurrent with LLMClient._vllm_batch_completion

### Version

latest

### Operating System

Linux

### Python Version

3.12

### What happened?

Running a local vLLM server, configuring it to be used with synthetic-data-kit via config, and configuring the `generation.batch_size` key in config should send batch requests to the local vLLM server.

But the vLLM server logs show that requests are sent sequentially.


### Relevant log output

```shell
(APIServer pid=1) INFO 09-05 08:52:54 [loggers.py:123] Engine 000: Avg prompt throughput: 1.4 tokens/s, Avg generation throughput: 20.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 09-05 08:53:04 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 16.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.3%, Prefix cache hit rate: 0.0%
```

### Steps to reproduce

1. Run a local vLLM server
2. Set the `generation.batch_size` or `curate.batch_size` key to 16 or 32
3. Run synthetic-data-kit generation, curation etc
4. check vLLM server logs to see requests in flight - should be 16 or 32 - is 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: requests are not truly concurrent with LLMClient._vllm_batch_completion #67

Version

Operating System

Python Version

What happened?

Relevant log output

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: requests are not truly concurrent with LLMClient._vllm_batch_completion #67

Description

Version

Operating System

Python Version

What happened?

Relevant log output

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions