-
Notifications
You must be signed in to change notification settings - Fork 379
[fix] batch size in openai compatible endpoint #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughImplements batched, concurrent request processing with per-batch retries, caching, and metrics in OpenAI-compatible chat/simple models. Adds new initialization parameters and helpers in the simple model. Updates protocol video handling with optional dependency import and a guarded error when unavailable. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Model as OpenAICompatible.generate_until
participant Batcher
participant Exec as ThreadPoolExecutor
participant API as OpenAI API
participant Cache
Caller->>Model: generate_until(requests)
Model->>Batcher: group into batches
loop per batch
Model->>Cache: check resume cache
alt cache hit
Cache-->>Model: cached responses
else cache miss
Model->>Exec: submit per-payload tasks
par workers
Exec->>API: send payload
API-->>Exec: response/usage
end
Exec-->>Model: collected results
Model->>Cache: persist results (if continual)
end
Model->>Model: accumulate tokens/latency
end
Model-->>Caller: all responses + metrics
sequenceDiagram
autonumber
participant ChatMsg as ChatMessages
participant Proto as protocol.py
participant Utils as qwen_vl_utils.fetch_video
ChatMsg->>Proto: to_openai_messages(video content)
alt fetch_video available
Proto->>Utils: load frames
Utils-->>Proto: frames (PNG data URLs)
Proto-->>ChatMsg: OpenAI-formatted messages
else missing dependency
Proto-->>ChatMsg: raise ImportError
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lmms_eval/models/simple/openai_compatible.py (1)
164-186
: Guard video encoding when decord is unavailableIf decord isn’t installed,
VideoReader
is undefined and will crash with NameError. Raise a clear ImportError.def encode_video(self, video_path, for_get_frames_num): + if "VideoReader" not in globals(): + raise ImportError("decord is required for video processing. Install with: pip install decord") vr = VideoReader(video_path, ctx=cpu(0))
🧹 Nitpick comments (12)
lmms_eval/protocol.py (2)
100-100
: Shorten exception message to satisfy Ruff TRY003 and 88‑char limitKeep the message concise; installation guidance belongs in docs/logs.
Apply this diff:
- raise ImportError("qwen_vl_utils is required for video processing. Please install it with: pip install qwen-vl-utils") + raise ImportError( + "qwen_vl_utils is required for video processing; install qwen-vl-utils" + )
10-14
: Guard decord import in lmms_eval/protocol.py and mark fetch_video as Optional
- decord is imported at lmms_eval/protocol.py:7 — make it optional (try/except) or lazy-import inside the functions that use VideoReader to avoid breaking installs without video deps.
- Annotate fetch_video as Optional (e.g., from typing import Optional, Callable; fetch_video: Optional[Callable[..., Any]] = None) so Pyright knows it may be None.
- Repo scan shows many other files import decord at module import time — consider applying the same guard across video-related modules.
lmms_eval/models/simple/openai_compatible.py (6)
36-38
: Use Optional typing for nullable params and import OptionalType hints should not use implicit Optional. Please annotate
base_url
andapi_key
asOptional[str]
and importOptional
.-from typing import List, Tuple, Union +from typing import List, Tuple, Union, Optional ... - base_url: str = None, - api_key: str = None, + base_url: Optional[str] = None, + api_key: Optional[str] = None,Also applies to: 7-7
121-124
: Annotate property return typeReturn type hints are required by the guidelines.
- def batch_size(self): + def batch_size(self) -> int: return self.batch_size_per_gpu
131-138
: Annotate property return typesAdd explicit return types for properties per guidelines.
- def eot_token_id(self): + def eot_token_id(self) -> int: return 0 ... - def rank(self): + def rank(self) -> int: return self._rank
229-246
: File type checks: prefer robust suffix-based detectionUse
Path(visual).suffix.lower()
over substring checks to avoid false matches and handle uppercase extensions.+from pathlib import Path ... - if isinstance(visual, str) and (".mp4" in visual or ".avi" in visual or ".mov" in visual or ".flv" in visual or ".wmv" in visual): + if isinstance(visual, str) and Path(visual).suffix.lower() in {".mp4", ".avi", ".mov", ".flv", ".wmv"}: @@ - elif isinstance(visual, str) and (".jpg" in visual or ".jpeg" in visual or ".png" in visual or ".gif" in visual or ".bmp" in visual or ".tiff" in visual or ".webp" in visual): + elif isinstance(visual, str) and Path(visual).suffix.lower() in {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".tiff", ".webp"}:
289-299
: Avoid blindexcept Exception
; consider narrower exceptions or log typeCatching all exceptions hides actionable errors. At minimum, log exception type; preferably catch OpenAI-specific exceptions.
- except Exception as e: - error_msg = str(e) + except Exception as e: + error_msg = f"{e.__class__.__name__}: {e}"
50-55
: Docstrings for public APIAdd class/method docstrings for this public model and
generate_until
per guidelines.lmms_eval/models/chat/openai_compatible.py (4)
47-48
: Remove unused variablectx
ctx
is unpacked but unused. Prefix with_
or drop it.- ctx, doc_to_messages, gen_kwargs, doc_id, task, split = req.args + _, doc_to_messages, gen_kwargs, doc_id, task, split = req.args
109-118
: Avoid blindexcept Exception
; at least log exception typePrefer catching OpenAI-specific exceptions or log the exception class name to aid triage.
- except Exception as e: - error_msg = str(e) + except Exception as e: + error_msg = f"{e.__class__.__name__}: {e}"
34-37
: Prefer usingself.batch_size
accessorUse the accessor added in the simple model for consistency.
- batch_size = getattr(self, "batch_size_per_gpu", 1) + batch_size = getattr(self, "batch_size", getattr(self, "batch_size_per_gpu", 1))
28-33
: Docstring for public API methodAdd a brief docstring for
generate_until
explaining batched behavior and return order guarantees.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lmms_eval/models/chat/openai_compatible.py
(2 hunks)lmms_eval/models/simple/openai_compatible.py
(7 hunks)lmms_eval/protocol.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py
: Type hints are required for all Python code
Public APIs must have docstrings
Maximum line length is 88 characters
Use PEP 8 naming: snake_case for functions/variables
Class names must use PascalCase
Constants should be in UPPER_SNAKE_CASE
Use f-strings for string formatting
Use early returns to avoid nested conditions
Use descriptive names; prefix handler functions with 'handle'
Prefer constants over functions where possible
Prefer functional, immutable approaches when not verbose
Define composing (higher-level) functions before their components
Mark issues in existing code with TODO: prefix in comments
Use functional and stateless approaches where they improve clarity
Use Ruff to enforce: import sorting (I001) and no unused imports
For long strings, wrap using parentheses rather than backslashes
Format long function calls over multiple lines with proper indentation
Split long import lists across multiple lines
Use Pyright type checking: add explicit None checks for Optional values
Use Pyright type narrowing for strings where applicable
Use Ruff (via pre-commit) to format and lint Python files
Document public APIs and test thoroughly
Files:
lmms_eval/protocol.py
lmms_eval/models/chat/openai_compatible.py
lmms_eval/models/simple/openai_compatible.py
🧬 Code graph analysis (2)
lmms_eval/models/chat/openai_compatible.py (5)
lmms_eval/api/registry.py (1)
register_model
(11-24)lmms_eval/models/simple/openai_compatible.py (3)
batch_size
(122-123)rank
(136-137)process_single_request
(279-299)lmms_eval/api/task.py (1)
doc_to_messages
(1648-1680)lmms_eval/api/instance.py (1)
args
(25-29)lmms_eval/protocol.py (2)
ChatMessages
(45-122)to_openai_messages
(89-109)
lmms_eval/models/simple/openai_compatible.py (3)
lmms_eval/utils.py (3)
Collator
(865-1020)chunks
(136-171)get_batched
(891-913)lmms_eval/api/instance.py (1)
args
(25-29)lmms_eval/models/chat/openai_compatible.py (1)
process_single_request
(88-119)
🪛 Ruff (0.13.1)
lmms_eval/protocol.py
100-100: Avoid specifying long messages outside the exception class
(TRY003)
lmms_eval/models/chat/openai_compatible.py
47-47: Unpacked variable ctx
is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
89-89: Function definition does not bind loop variable batch_responses
(B023)
90-90: Function definition does not bind loop variable batch_responses
(B023)
107-107: Consider moving this statement to an else
block
(TRY300)
109-109: Do not catch blind exception: Exception
(BLE001)
lmms_eval/models/simple/openai_compatible.py
36-36: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
37-37: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
128-128: Unused method argument: tokens
(ARG002)
280-280: Function definition does not bind loop variable batch_responses
(B023)
281-281: Function definition does not bind loop variable batch_responses
(B023)
287-287: Consider moving this statement to an else
block
(TRY300)
289-289: Do not catch blind exception: Exception
(BLE001)
for batch_requests in batched_requests: | ||
batch_payloads = [] | ||
batch_doc_uuids = [] | ||
batch_responses = [] | ||
|
||
for req in batch_requests: | ||
ctx, doc_to_messages, gen_kwargs, doc_id, task, split = req.args | ||
doc_uuid = f"{task}___{split}___{doc_id}" | ||
self.response_cache[doc_uuid] = response_text | ||
batch_doc_uuids.append(doc_uuid) | ||
|
||
if self.continual_mode is True and self.cache_mode == "resume": | ||
if doc_uuid in self.response_cache: | ||
response_text = self.response_cache[doc_uuid] | ||
if response_text: | ||
batch_responses.append(response_text) | ||
continue | ||
|
||
chat_messages_raw = doc_to_messages(self.task_dict[task][split][doc_id]) | ||
chat_messages: ChatMessages = ChatMessages(**{"messages": chat_messages_raw}) | ||
|
||
payload = {"messages": chat_messages.to_openai_messages()} | ||
payload["model"] = self.model_version | ||
|
||
if "max_new_tokens" not in gen_kwargs: | ||
gen_kwargs["max_new_tokens"] = 1024 | ||
if gen_kwargs["max_new_tokens"] > 4096: | ||
gen_kwargs["max_new_tokens"] = 4096 | ||
if "temperature" not in gen_kwargs: | ||
gen_kwargs["temperature"] = 0 | ||
if "top_p" not in gen_kwargs: | ||
gen_kwargs["top_p"] = None | ||
if "num_beams" not in gen_kwargs: | ||
gen_kwargs["num_beams"] = 1 | ||
|
||
payload["max_tokens"] = gen_kwargs["max_new_tokens"] | ||
payload["temperature"] = gen_kwargs["temperature"] | ||
|
||
if "o1" in self.model_version or "o3" in self.model_version or "o4" in self.model_version: | ||
del payload["temperature"] | ||
payload.pop("max_tokens") | ||
payload["reasoning_effort"] = "medium" | ||
payload["response_format"] = {"type": "text"} | ||
payload["max_completion_tokens"] = gen_kwargs["max_new_tokens"] | ||
|
||
batch_payloads.append(payload) | ||
batch_responses.append(None) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexing mismatch corrupts response alignment within batch
Same misalignment as the simple variant: cached responses are appended while payloads skip them, then indices from enumerate(batch_payloads)
are used to update batch_responses
. This leads to wrong pairings.
- batch_payloads = []
- batch_doc_uuids = []
- batch_responses = []
+ batch_len = len(batch_requests)
+ batch_payloads = [None] * batch_len
+ batch_doc_uuids = [None] * batch_len
+ batch_responses = [None] * batch_len
@@
- for req in batch_requests:
- ctx, doc_to_messages, gen_kwargs, doc_id, task, split = req.args
+ for i, req in enumerate(batch_requests):
+ _, doc_to_messages, gen_kwargs, doc_id, task, split = req.args
doc_uuid = f"{task}___{split}___{doc_id}"
- batch_doc_uuids.append(doc_uuid)
+ batch_doc_uuids[i] = doc_uuid
@@
- if doc_uuid in self.response_cache:
- response_text = self.response_cache[doc_uuid]
- if response_text:
- batch_responses.append(response_text)
- continue
+ if doc_uuid in self.response_cache:
+ response_text = self.response_cache[doc_uuid]
+ if response_text:
+ batch_responses[i] = response_text
+ continue
@@
- batch_payloads.append(payload)
- batch_responses.append(None)
+ batch_payloads[i] = payload
+ # keep batch_responses[i] as None to be filled
@@
- tasks_to_run = [(payload, i) for i, payload in enumerate(batch_payloads) if batch_responses[i] is None]
+ tasks_to_run = [
+ (payload, i)
+ for i, payload in enumerate(batch_payloads)
+ if payload is not None and batch_responses[i] is None
+ ]
Also applies to: 121-133
🧰 Tools
🪛 Ruff (0.13.1)
47-47: Unpacked variable ctx
is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
In lmms_eval/models/chat/openai_compatible.py around lines 41-87 (and similarly
at 121-133), cached responses are appended to batch_responses while
corresponding payloads are skipped, breaking alignment when later iterating over
enumerate(batch_payloads). Fix by keeping arrays aligned: when you short-circuit
with a cached response, append a placeholder (e.g., None) to batch_payloads at
the same index (or otherwise ensure one-to-one positions for batch_payloads and
batch_responses), and when sending requests skip None payloads or filter/zip
payloads with their response indices so responses are written back into the
correct positions. Ensure the rest of the code that sends requests and writes
back uses these preserved indices to update batch_responses and the cache
correctly.
response = self.client.chat.completions.create(**payload) | ||
end_time = time.time() | ||
|
||
response_text = response.choices[0].message.content | ||
latency = end_time - start_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add per-call timeout to OpenAI request
Prevent indefinite hangs by passing timeout=self.timeout
to the API call.
- response = self.client.chat.completions.create(**payload)
+ response = self.client.chat.completions.create(**payload, timeout=self.timeout)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
response = self.client.chat.completions.create(**payload) | |
end_time = time.time() | |
response_text = response.choices[0].message.content | |
latency = end_time - start_time | |
response = self.client.chat.completions.create(**payload, timeout=self.timeout) | |
end_time = time.time() | |
response_text = response.choices[0].message.content | |
latency = end_time - start_time |
🤖 Prompt for AI Agents
In lmms_eval/models/chat/openai_compatible.py around lines 95 to 99, the OpenAI
API call can hang because it doesn't pass a per-call timeout; update the chat
completion invocation to include timeout=self.timeout (i.e., call
self.client.chat.completions.create(..., timeout=self.timeout)), ensuring the
instance has a numeric self.timeout attribute initialized earlier; keep the rest
of the response handling unchanged.
self.client = ( | ||
OpenAI(api_key=os.getenv("OPENAI_API_KEY"), base_url=os.getenv("OPENAI_API_BASE"), http_client=http_client) | ||
OpenAI(api_key=api_key, base_url=base_url, http_client=http_client) | ||
if not azure_openai | ||
else AzureOpenAI(api_key=os.getenv("AZURE_OPENAI_API_KEY"), azure_endpoint=os.getenv("AZURE_OPENAI_API_BASE"), api_version=os.getenv("AZURE_OPENAI_API_VERSION"), http_client=http_client) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add client-level request timeout (avoid indefinite hangs)
External calls lack an explicit timeout. Pass timeout=self.timeout
to the OpenAI/AzureOpenAI clients (or per-call) to prevent indefinite blocking.
- OpenAI(api_key=api_key, base_url=base_url, http_client=http_client)
+ OpenAI(api_key=api_key, base_url=base_url, http_client=http_client, timeout=self.timeout)
if not azure_openai
else AzureOpenAI(
- api_key=os.getenv("AZURE_OPENAI_API_KEY"),
- azure_endpoint=os.getenv("AZURE_OPENAI_API_BASE"),
- api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
- http_client=http_client
+ api_key=os.getenv("AZURE_OPENAI_API_KEY"),
+ azure_endpoint=os.getenv("AZURE_OPENAI_API_BASE"),
+ api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
+ http_client=http_client,
+ timeout=self.timeout,
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.client = ( | |
OpenAI(api_key=os.getenv("OPENAI_API_KEY"), base_url=os.getenv("OPENAI_API_BASE"), http_client=http_client) | |
OpenAI(api_key=api_key, base_url=base_url, http_client=http_client) | |
if not azure_openai | |
else AzureOpenAI(api_key=os.getenv("AZURE_OPENAI_API_KEY"), azure_endpoint=os.getenv("AZURE_OPENAI_API_BASE"), api_version=os.getenv("AZURE_OPENAI_API_VERSION"), http_client=http_client) | |
) | |
self.client = ( | |
OpenAI(api_key=api_key, base_url=base_url, http_client=http_client, timeout=self.timeout) | |
if not azure_openai | |
else AzureOpenAI( | |
api_key=os.getenv("AZURE_OPENAI_API_KEY"), | |
azure_endpoint=os.getenv("AZURE_OPENAI_API_BASE"), | |
api_version=os.getenv("AZURE_OPENAI_API_VERSION"), | |
http_client=http_client, | |
timeout=self.timeout, | |
) | |
) |
🤖 Prompt for AI Agents
In lmms_eval/models/simple/openai_compatible.py around lines 98 to 102, the
OpenAI/AzureOpenAI client is created without a request timeout which can cause
indefinite hangs; update the client instantiation to pass timeout=self.timeout
(or pass timeout into individual call methods if the client does not accept a
constructor timeout), i.e., include timeout=self.timeout when constructing
OpenAI and AzureOpenAI (or wire self.timeout into the per-request call options)
so all external requests honor the configured timeout.
def tok_encode(self, string: str): | ||
return list(string.encode("utf-8")) | ||
|
||
def tok_decode(self, tokens): | ||
return "" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix tok_decode to actually decode; remove unused-arg warning
tok_decode
ignores its input and always returns empty string; tokens
is flagged unused. Implement the inverse of tok_encode
with proper types.
- def tok_encode(self, string: str):
- return list(string.encode("utf-8"))
+ def tok_encode(self, string: str) -> List[int]:
+ return list(string.encode("utf-8"))
- def tok_decode(self, tokens):
- return ""
+ def tok_decode(self, tokens: List[int]) -> str:
+ return bytes(tokens).decode("utf-8", errors="ignore")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def tok_encode(self, string: str): | |
return list(string.encode("utf-8")) | |
def tok_decode(self, tokens): | |
return "" | |
def tok_encode(self, string: str) -> List[int]: | |
return list(string.encode("utf-8")) | |
def tok_decode(self, tokens: List[int]) -> str: | |
return bytes(tokens).decode("utf-8", errors="ignore") |
🧰 Tools
🪛 Ruff (0.13.1)
128-128: Unused method argument: tokens
(ARG002)
🤖 Prompt for AI Agents
In lmms_eval/models/simple/openai_compatible.py around lines 125 to 130,
tok_decode currently ignores its input and returns an empty string; implement it
as the inverse of tok_encode by accepting tokens (list/iterable of ints or
bytes/bytearray), converting them to a bytes object and decoding with utf-8 (use
errors="replace" or "strict" per project policy) and return the resulting str so
the tokens argument is used and properly typed.
def _collate(x): | ||
toks = self.tok_encode(x[0]) | ||
return -len(toks), x[0] | ||
|
||
from lmms_eval import utils | ||
|
||
re_ords = utils.Collator([reg.args for reg in requests], _collate, grouping=True) | ||
chunks = re_ords.get_batched(n=self.batch_size, batch_fn=None) | ||
num_iters = len(requests) // self.batch_size if len(requests) % self.batch_size == 0 else len(requests) // self.batch_size + 1 | ||
pbar = tqdm(total=num_iters, disable=(self.rank != 0), desc="Model Responding") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering bug: results not restored to original order
You reorder with Collator
but return res
in reordered batch order. Restore original order before returning.
- pbar.close()
- return res
+ pbar.close()
+ return re_ords.get_original(res)
Also applies to: 319-324
🤖 Prompt for AI Agents
In lmms_eval/models/simple/openai_compatible.py around lines 197-206 (and
similarly at 319-324), the code uses utils.Collator to reorder requests for
batching but returns results in the collated (reordered) sequence; restore the
original request order before returning. Capture the original indices from
Collator (or keep a mapping of request -> original position) when creating
re_ords/get_batched, then after processing batches collect outputs into a result
array sized to len(requests) and place each batch's responses at their original
indices; finally return that array in original order (do the same for both code
locations).
batch_payloads = [] | ||
batch_doc_uuids = [] | ||
batch_responses = [] | ||
|
||
for i, (context, doc_id_single) in enumerate(zip(contexts, doc_id)): | ||
doc_uuid = f"{task}___{split}___{doc_id_single}" | ||
batch_doc_uuids.append(doc_uuid) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexing mismatch corrupts mapping between payloads and responses
batch_responses
appends cached entries while batch_payloads
skips them, then enumerate(batch_payloads)
is used as index into batch_responses
. This misalignment will overwrite cached entries and mix up responses.
Apply indexed preallocation and keep indices aligned:
- batch_payloads = []
- batch_doc_uuids = []
- batch_responses = []
+ batch_len = len(contexts)
+ batch_payloads: List[Optional[dict]] = [None] * batch_len
+ batch_doc_uuids: List[Optional[str]] = [None] * batch_len
+ batch_responses: List[Optional[str]] = [None] * batch_len
- for i, (context, doc_id_single) in enumerate(zip(contexts, doc_id)):
+ for i, (context, doc_id_single) in enumerate(zip(contexts, doc_id)):
doc_uuid = f"{task}___{split}___{doc_id_single}"
- batch_doc_uuids.append(doc_uuid)
+ batch_doc_uuids[i] = doc_uuid
if self.continual_mode is True and self.cache_mode == "resume":
if doc_uuid in self.response_cache:
response_text = self.response_cache[doc_uuid]
if response_text:
- batch_responses.append(response_text)
- continue
+ batch_responses[i] = response_text
+ continue
@@
- batch_payloads.append(payload)
- batch_responses.append(None)
+ batch_payloads[i] = payload
+ # leave batch_responses[i] as None to be filled later
@@
- tasks_to_run = [(payload, i) for i, payload in enumerate(batch_payloads) if batch_responses[i] is None]
+ tasks_to_run = [
+ (payload, i)
+ for i, payload in enumerate(batch_payloads)
+ if payload is not None and batch_responses[i] is None
+ ]
Also applies to: 222-228, 276-300, 301-312
🤖 Prompt for AI Agents
In lmms_eval/models/simple/openai_compatible.py around lines 214-221 (and also
apply same fix to 222-228, 276-300, 301-312), the loop builds batch_payloads and
batch_responses out of sync because cached responses are appended to
batch_responses while payloads skip cached entries, then
enumerate(batch_payloads) is used to index into batch_responses causing
misalignment; fix by preallocating placeholders (e.g., initialize
batch_payloads, batch_doc_uuids, batch_responses to the same length with None)
or by maintaining an explicit index/position mapping so that for every input
position you either set payloads[pos]=payload and responses[pos]=None (to be
filled later) or set responses[pos]=cached_value and payloads[pos]=None,
ensuring you never change relative ordering — then when creating request batches
skip None payloads but preserve original indices for mapping responses back, and
when filling responses place them into batch_responses using the original
positions so cached and fresh responses remain aligned.
if fetch_video is None: | ||
raise ImportError("qwen_vl_utils is required for video processing. Please install it with: pip install qwen-vl-utils") | ||
video_input = fetch_video({"type": "video", "video": content.url, **video_kwargs}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix None-unpacking bug in video kwargs
If video_kwargs
is None (the default), **video_kwargs
raises at runtime. Use a safe default.
Apply this diff:
- video_input = fetch_video({"type": "video", "video": content.url, **video_kwargs})
+ video_input = fetch_video({
+ "type": "video",
+ "video": content.url,
+ **(video_kwargs or {}),
+ })
Additionally, consider updating the signature to reflect Optional and non‑string values (recommended for Pyright):
- Change to:
def to_openai_messages(self, video_kwargs: Optional[Dict[str, Any]] = None):
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if fetch_video is None: | |
raise ImportError("qwen_vl_utils is required for video processing. Please install it with: pip install qwen-vl-utils") | |
video_input = fetch_video({"type": "video", "video": content.url, **video_kwargs}) | |
if fetch_video is None: | |
raise ImportError("qwen_vl_utils is required for video processing. Please install it with: pip install qwen-vl-utils") | |
video_input = fetch_video({ | |
"type": "video", | |
"video": content.url, | |
**(video_kwargs or {}), | |
}) |
🧰 Tools
🪛 Ruff (0.13.1)
100-100: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In lmms_eval/protocol.py around lines 99 to 101, the code unpacks video_kwargs
without guarding against None which raises a TypeError at runtime; change the
function signature to use typing Optional and Any (e.g., def
to_openai_messages(self, video_kwargs: Optional[Dict[str, Any]] = None):) and
inside the function replace **video_kwargs with **(video_kwargs or {}) so a safe
empty dict is used when None is passed; also add the necessary typing imports if
not present.
Hello, thanks for this great library. We are regularly using it for the CI in SGLang. To increase the speed of CI, we want to increase the batch size. However the batch size in the
openai_compatible
was not working since it was still processing requests in a loop.Prev it took around 5 minutes, now it takes around 2min 45s.
Summary by CodeRabbit
New Features
Refactor
Bug Fixes