-
Notifications
You must be signed in to change notification settings - Fork 167
feat: Multi-turn tool calling on BFCLv3 dataset #1207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
📝 WalkthroughWalkthroughAdds BFCL v3 multi-turn tool-calling capability: new environment, tool APIs, datasets, processors, configs, training and eval scripts. Extends rollout/eval control flow for multi-turn episodes, adds logging of env metadata, and updates distributed registry. Includes documentation and dependency addition. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant RunScript as run_grpo_bfclv3.py
participant Ray as Ray Cluster
participant Policy as GRPO Policy
participant Env as MultiTurnToolEnvironment
participant Tools as ToolManager/APIs
User->>RunScript: Launch with grpo_bfclv3.yaml
RunScript->>Ray: init()
RunScript->>Policy: setup(model, tokenizer, gen_cfg)
RunScript->>Env: create actors (bfcl_multiturn, max_turns)
loop Training Steps
Policy->>Env: rollout request (batch)
Env->>Tools: load scenario (model/GT instances)
loop per turn (<= max_turns)
Policy->>Env: assistant response
Env->>Env: parse <tool> call, validate
Env->>Tools: execute model and GT calls
Env-->>Policy: observation, reward fragment, metadata
Policy->>Policy: GRPO update (rewards, baseline)
end
Policy->>RunScript: log rewards + metadata (JSON)
end
sequenceDiagram
autonumber
participant User
participant EvalScript as run_eval.py
participant Loader as load_response_dataset
participant Dataloader
participant VLLM as vLLM Generation
participant Env as MultiTurnToolEnvironment
User->>EvalScript: Run with bfcl_eval.yaml
EvalScript->>Loader: load_response_dataset("bfcl_multiturn")
Loader-->>EvalScript: dataset (train/val)
EvalScript->>Env: create if bfcl_multiturn.enable
EvalScript->>Dataloader: build with tokenizer
alt Multi-turn enabled
loop Batches
EvalScript->>VLLM: generate turns (rollout)
VLLM-->>Env: assistant messages
Env-->>EvalScript: rewards, logs, extra_env_info
end
else Single-turn
EvalScript->>VLLM: generate once
end
EvalScript-->>User: prompts, responses, rewards
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 42
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nemo_rl/evals/eval.py (1)
443-452: Fix cons@k for multi-turn or gate it.env_return is undefined in the multi-turn branch; this crashes when metric == "cons@k".
- if metric == "pass@k": + if metric == "pass@k": score += eval_pass_k(rewards, num_tests_per_prompt, k_value) elif metric == "cons@k": - extracted_answers = env_return.answers - score += eval_cons_k( - rewards, num_tests_per_prompt, k_value, extracted_answers - ) + if master_config.get("env", {}).get("bfcl_multiturn", {}).get("enable", False): + raise NotImplementedError("cons@k is not supported for multi-turn evaluation.") + extracted_answers = env_return.answers + score += eval_cons_k(rewards, num_tests_per_prompt, k_value, extracted_answers) else: raise ValueError(f"Invalid metric: {metric}")
🧹 Nitpick comments (47)
examples/run_eval.py (1)
54-55: Stop returning tokenizer from setup_data to avoid redundancysetup_data receives the tokenizer and returns it unchanged, leading to unnecessary plumbing and potential confusion.
Apply these diffs:
-def setup_data(tokenizer: AutoTokenizer, data_config, env_configs): +def setup_data(tokenizer: AutoTokenizer, data_config, env_configs): @@ - dataset = AllTaskProcessedDataset( + dataset = AllTaskProcessedDataset( dataset=rekeyed_ds, tokenizer=tokenizer, default_task_data_spec=base_dataset.task_spec, task_data_processors=base_dataset.processor, max_seq_length=data_config["max_input_seq_length"], ) - - return dataset, env, tokenizer + return dataset, envAnd update the call site:
- ( - dataset, - env, - tokenizer, - ) = setup_data(tokenizer, config["data"], config["env"]) + dataset, env = setup_data(tokenizer, config["data"], config["env"])Also applies to: 79-86, 87-88
nemo_rl/algorithms/grpo.py (1)
1710-1715: Add metadata logging in async path for parityThe async training path omits metadata logging, leading to inconsistent logs across modes.
Apply this diff:
log_data = {"content": flat_messages["content"]} log_data["rewards"] = rewards.tolist() log_data["generation_logprobs"] = train_data["generation_logprobs"].tolist() log_data["prev_logprobs"] = train_data["prev_logprobs"].tolist() log_data["input_lengths"] = input_lengths.tolist() + # Include environment metadata (normalized to rewards length) + try: + extra_env_info = repeated_batch["extra_env_info"] + except KeyError: + extra_env_info = [None] * len(rewards) + if not isinstance(extra_env_info, (list, tuple)): + extra_env_info = [extra_env_info] * len(rewards) + else: + extra_env_info = list(extra_env_info) + if len(extra_env_info) < len(rewards): + extra_env_info.extend([None] * (len(rewards) - len(extra_env_info))) + elif len(extra_env_info) > len(rewards): + extra_env_info = extra_env_info[: len(rewards)] + log_data["metadata"] = [ + json.dumps(m, default=str) if m is not None else "" for m in extra_env_info + ] logger.log_batched_dict_as_jsonl(log_data, f"train_data_step{step}.jsonl")examples/configs/grpo_bfclv3.yaml (1)
3-15: Add brief inline docs for new keysAdd one-line comments for grpo.max_rollout_turns and env.bfcl_multiturn.num_workers to guide users.
Also applies to: 58-61
nemo_rl/data/datasets/eval_datasets/bfclv3.py (1)
25-37: Use logging instead of prints for library codeSwitch prints to a logger to avoid noisy stdout in library usage.
nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py (3)
49-53: Fix docstring: remove nonexistent split argDocstring mentions split that isn’t in the signature.
Apply this diff:
- """Initialize the BFCL Multiturn dataset. - - Args: - split: Split to load ("train" or "validation") - seed: Random seed for reproducibility - prompt_file: Optional prompt file path - """ + """Initialize the BFCL Multiturn dataset. + + Args: + seed: Random seed for reproducibility. + prompt_file: Optional prompt file path for prompt injection. + """
16-20: Catch specific exceptions instead of broad ExceptionBroad except masks real issues. Narrow to huggingface_hub and I/O errors. Also import the exception types.
Apply these diffs:
-from huggingface_hub import hf_hub_download +from huggingface_hub import ( + hf_hub_download, + HfHubHTTPError, + EntryNotFoundError, + RepositoryNotFoundError, +)- except Exception as e: + except (HfHubHTTPError, EntryNotFoundError, RepositoryNotFoundError, OSError) as e: print(f"❌ Error loading training data: {e}") raise- except Exception as e: + except (HfHubHTTPError, EntryNotFoundError, RepositoryNotFoundError, OSError) as e: print(f"Could not load validation data: {e}") print("Continuing with training data only...")[Based on learnings from static analysis hints]
Also applies to: 85-87, 106-109
64-67: Prefer logging to prints for dataset loadersReplace prints with a logger to keep library quiet under normal use.
nemo_rl/data/processors.py (3)
291-297: Avoid adding non‑declared keys to DatumSpec; use extra for dataset.DatumSpec does not include dataset. Store it under extra to stay within the TypedDict contract.
- # Add task_name and dataset if present + # Add task_name and dataset if present if "task_name" in datum_dict: output["task_name"] = datum_dict["task_name"] - if "dataset" in datum_dict: - output["dataset"] = datum_dict["dataset"] + if "dataset" in datum_dict: + output["__extra__"] = {**output.get("__extra__", {}), "dataset": datum_dict["dataset"]}
354-363: Store dataset under extra rather than top-level DatumSpec key.Keep DatumSpec schema consistent.
output: DatumSpec = { "message_log": message_log, "length": length, "extra_env_info": extra_env_info, "loss_multiplier": loss_multiplier, "idx": idx, "task_name": datum_dict["task_name"], - "dataset": datum_dict["dataset"], } + # Keep additional fields under __extra__ + output["__extra__"] = {**output.get("__extra__", {}), "dataset": datum_dict.get("dataset")}
232-238: Silence ARG001 or purposefully use parameters.Ruff flags task_data_spec and/or max_seq_length as unused. You’ve now used max_seq_length; if task_data_spec remains unused, either underscore-prefix it or add a brief comment to indicate it’s part of the processor interface.
nemo_rl/environments/tools/ticket_api.py (2)
35-47: Use or store long_context to satisfy interface and silence lints.Store the long_context argument like other tools.
def _load_scenario(self, scenario: dict, long_context=False) -> None: @@ - DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE) + DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE) + self.long_context = long_context
141-171: Consider validating editable fields more strictly (status domain).Optionally restrict status to a known set (e.g., {"Open","Closed","Resolved"}) and priority to 1..5 as in create_ticket.
docs/guides/multi_turn_bfcl.md (4)
5-5: Use descriptive link text.Replace “this” with a descriptive label.
-We use tool APIs from [this](https://github.com/bespokelabsai/verifiers/tree/main/verifiers/tools/bfcl_tools) repository. +We use tool APIs from the [Bespoke Labs Verifiers BFCL tools](https://github.com/bespokelabsai/verifiers/tree/main/verifiers/tools/bfcl_tools) repository.
10-11: Minor grammar fix for wandb sentence.-You will have to login to wandb/ set you wandb API_KEY to get the logs in wandb. +You must log in to Weights & Biases and set your WANDB_API_KEY to enable logging.
135-139: Specify language for fenced code block.Add a language identifier to satisfy MD040.
-``` +```text Turn Reward = 0.5 × state_score + 0.5 × call_score Total Episode Reward = (Turn 1 + Turn 2 + ... + Turn N) / max_turns--- `160-166`: **Specify language for fenced code block.** Add a language identifier to satisfy MD040. ```diff -``` +```bash # Example for a GRPO checkpoint at step 170 uv run python examples/converters/convert_dcp_to_hf.py \ --config results/grpo/step_170/config.yaml \ --dcp-ckpt-path results/grpo/step_170/policy/weights/ \ --hf-ckpt-path results/grpo/hf</blockquote></details> <details> <summary>nemo_rl/environments/tools/twitter_api.py (2)</summary><blockquote> `29-51`: **Use or store long_context to satisfy interface and silence lints.** ```diff def _load_scenario(self, scenario: dict, long_context=False) -> None: @@ - DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE) + DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE) + self.long_context = long_context
166-172: Require authentication for mention to be consistent with other actions.def mention(self, tweet_id: int, mentioned_usernames: List[str]) -> Dict[str, str]: @@ - if tweet_id not in self.tweets: + if not self.authenticated: + return {"error": "User not authenticated. Please authenticate before mentioning users."} + if tweet_id not in self.tweets: return {"error": f"Tweet with ID {tweet_id} not found."}nemo_rl/environments/tools/message_api.py (5)
69-88: Use or store long_context; seed usage is fine.Store the long_context parameter to align with other tools and silence lint.
def _load_scenario(self, scenario: dict, long_context=False) -> None: @@ - DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE) + DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE) + self.long_context = long_context
224-244: Iterate dicts idiomatically in view_messages_sent.Minor cleanups; use next(iter(...)).
- for message in self.inbox: - receiver, message_content = list(message.items())[0] + for message in self.inbox: + receiver, message_content = next(iter(message.items()))
245-266: Typos and minor robustness in add_contact.
- Fix “User ID exists” check redundancy is harmless; this is fine. Optional nit: clarify error messages.
268-296: Iterate dicts idiomatically in search_messages.Also minor cleanup.
- for message_data in self.inbox: - receiver_id, message_content = list(message_data.items())[0] + for message_data in self.inbox: + receiver_id, message_content = next(iter(message_data.items())) if keyword_lower in message_content.lower():
142-160: Minor docstring fixes.Typos (“messeage”) and clarity; optional.
nemo_rl/environments/tools/math_api.py (1)
49-65: Unify return type annotations across methods that can return errors.Methods like mean, standard_deviation, divide, round_number, percentage, min_value, max_value, sum_values may return {"error": str} but are annotated as Dict[str, float]. Update signatures to Dict[str, Union[float, str]] (or define a TypedDict) to match behavior and prevent type confusion downstream.
As per coding guidelines
Also applies to: 66-84, 170-238, 298-317, 318-335, 336-352, 353-369, 370-385
nemo_rl/environments/tools/gorilla_file_system.py (4)
195-199: Simplify root key extraction.Use next(iter(...)) to avoid repeated key extraction and improve clarity.
Apply this diff:
- if "root" in scenario: - root_dir = Directory(list(scenario["root"].keys())[0], None) - self.root = self._load_directory( - scenario["root"][list(scenario["root"].keys())[0]]["contents"], root_dir - ) + if "root" in scenario: + root_key = next(iter(scenario["root"])) + root_dir = Directory(root_key, None) + self.root = self._load_directory( + scenario["root"][root_key]["contents"], root_dir + )
586-593: diff(): add strict=False to zip for clarity and future safety.Explicit strict parameter aids readability and avoids silent truncation if inputs diverge.
Apply this diff:
- diff_lines = [ - f"- {line1}\n+ {line2}" - for line1, line2 in zip(content1, content2) - if line1 != line2 - ] + diff_lines = [ + f"- {line1}\n+ {line2}" + for line1, line2 in zip(content1, content2, strict=False) + if line1 != line2 + ]
615-621: Remove f-strings without placeholders.Minor style fix; keeps Ruff happy.
Apply this diff:
- return { - "error": f"mv: no path allowed in destination. Only file name and folder name is supported for this operation." - } + return { + "error": "mv: no path allowed in destination. Only file name and folder name is supported for this operation." + } @@ - return { - "error": f"cp: don't allow path in destination. Only file name and folder name is supported for this operation." - } + return { + "error": "cp: don't allow path in destination. Only file name and folder name is supported for this operation." + }Also applies to: 727-733
395-425: find(): 'path' arg is ignored for traversal; only used as a prefix.Start directory should honor 'path' (relative or absolute) via _navigate_to_directory(). Current behavior is misleading.
Proposed change (outline):
- Resolve target_dir = _navigate_to_directory(path)
- If error dict, return it
- Use that as the root for recursive_search and base_path based on resolved path
nemo_rl/environments/tools/trading_bot.py (1)
698-712: add_to_watchlist(): response key and docstring mismatch.Docstring says it returns the symbol added; implementation returns the whole watchlist under the key "symbol". Prefer returning {"watchlist": self.watch_list} or {"symbol": stock} consistently.
Proposed change:
- return {"symbol": self.watch_list} + return {"watchlist": self.watch_list}nemo_rl/environments/tools/vehicle_control.py (2)
464-498: setCruiseControl(): unit mismatch in docstring.Doc mentions m/h and return says km/h. Clarify and use a consistent unit in docstring and response key to avoid confusion.
Proposed docstring edit: replace "m/h" with "km/h" (or vice versa), and ensure returned currentSpeed matches. Based on learnings
50-175: Method naming style deviates from snake_case.Public methods use camelCase (e.g., startEngine). Repo guidelines prefer snake_case, but renaming may break datasets/tool calls. Consider leaving as-is for compatibility and documenting the exception.
As per coding guidelines
Also applies to: 215-700
nemo_rl/data/datasets/response_datasets/__init__.py (1)
19-21: Pass seed (and optionally prompt_file) into BFCLMultiturnDataset for reproducibility.load_response_dataset consistently forwards seed to datasets. Align this branch and avoid relying on in-code defaults inside the dataset class.
Apply:
- elif dataset_name == "bfcl_multiturn": - base_dataset: Any = BFCLMultiturnDataset() + elif dataset_name == "bfcl_multiturn": + base_dataset: Any = BFCLMultiturnDataset( + seed=seed, + prompt_file=data_config.get("prompt_file"), + )Also applies to: 91-94
examples/run_grpo_bfclv3.py (3)
78-84: Treat optional config keys as optional (no in-code defaults).DataConfig marks prompt_file and system_prompt_file as optional; access via .get to avoid KeyError while still passing None when missing.
As per coding guidelines
- task_spec = TaskDataSpec( - task_name="bfcl_multiturn", - prompt_file=data_config["prompt_file"], - system_prompt_file=data_config["system_prompt_file"], - ) + task_spec = TaskDataSpec( + task_name="bfcl_multiturn", + prompt_file=data_config.get("prompt_file"), + system_prompt_file=data_config.get("system_prompt_file"), + )
121-139: Don’t proceed with RL without an environment; fail fast.Mapping to None will crash later. Make the requirement explicit if env is disabled/missing.
- if not task_to_env: - # You can set up a default environment here if needed - task_to_env = defaultdict(lambda: None) - task_to_env[task_name] = None + if not task_to_env: + raise ValueError( + "bfcl_multiturn environment is not enabled/configured. " + "Please set env.bfcl_multiturn.enable=true and provide its config." + )
189-201: Silence unused ‘cluster’ binding.cluster is unpacked but unused. Prefix with underscore to satisfy linters.
Based on static analysis hints
- ( - policy, - policy_generation, - cluster, + ( + policy, + policy_generation, + _cluster, dataloader, val_dataloader, loss_fn, logger, checkpointer, grpo_state, master_config, ) = setup(config, tokenizer, dataset, val_dataset)nemo_rl/evals/eval.py (1)
353-374: Prefix unused rollout_metrics to satisfy linters.Variable is not used; avoid linter noise.
Based on static analysis hints
- batch, rollout_metrics = run_async_multi_turn_rollout( + batch, _rollout_metrics = run_async_multi_turn_rollout( @@ - batch, rollout_metrics = run_multi_turn_rollout( + batch, _rollout_metrics = run_multi_turn_rollout(nemo_rl/environments/multi_turn_tool_environment.py (10)
53-61: Logging and typing polish in ToolManager
- Annotate TOOL_CLASS_MAPPING as ClassVar to satisfy ruff RUF012.
- Replace print with logger.warning/exception.
- Define a module logger once.
Apply this diff:
-from typing import Any, Dict, List, Optional, Tuple, TypedDict, Set +from typing import Any, Dict, List, Optional, Tuple, TypedDict, Set, ClassVar @@ import ray +logger = logging.getLogger(__name__) @@ - TOOL_CLASS_MAPPING = { + TOOL_CLASS_MAPPING: ClassVar[dict[str, str]] = { @@ - except Exception as e: - print(f"Failed to initialize {tool_name}: {e}") + except ModuleNotFoundError as e: + logger.warning("Failed to import %s: %s", tool_name, e) + except AttributeError as e: + logger.warning("Tool class missing in module for %s: %s", tool_name, e) + except Exception as e: + logger.exception("Failed to initialize %s: %s", tool_name, e)Also applies to: 15-23, 81-83
85-120: Parse all blocks; don’t drop later onesCurrently only the first ... block is parsed. Aggregate calls from all tool tags.
Apply this diff:
- tool_matches = re.findall(tool_tag_pattern, assistant_response, re.DOTALL) - if tool_matches: - # Parse JSON array inside tool tags - try: - tool_content = tool_matches[0].strip() - tool_calls_raw = json.loads(tool_content) - - tool_calls: List[Dict[str, Any]] = [] - if isinstance(tool_calls_raw, list): - for item in tool_calls_raw: + tool_matches = re.findall(tool_tag_pattern, assistant_response, re.DOTALL) + if tool_matches: + tool_calls: List[Dict[str, Any]] = [] + # Parse JSON arrays inside each tool tag + for tool_content in tool_matches: + try: + tool_calls_raw = json.loads(tool_content.strip()) + except json.JSONDecodeError: + continue + if isinstance(tool_calls_raw, list): + for item in tool_calls_raw: if isinstance(item, dict): # Validate tool call format validated_item = self._validate_tool_call_format(item) if validated_item: tool_calls.append(validated_item) elif isinstance(item, str): try: parsed_item = json.loads(item) if isinstance(parsed_item, dict): # Validate tool call format validated_item = self._validate_tool_call_format(parsed_item) if validated_item: tool_calls.append(validated_item) except json.JSONDecodeError: # Skip malformed entries; they'll be handled later. continue - return tool_calls - except json.JSONDecodeError: - return [] + return tool_calls return []
245-247: Use identity for exact type equality (ruff E721)Prefer is/is not for exact class match.
Apply this diff:
- if type(model_instance) != type(gt_instance): + if type(model_instance) is not type(gt_instance): continue
339-342: Unused parameter and role naming – verify downstream expectations
- metadata isn’t used; rename to underscore to avoid lint ARG002.
- Verify that downstream components accept role="environment" for observations; many pipelines expect role="user".
Apply this diff:
- def _get_next_observation(self, tool_results: str, metadata: MultiTurnToolMetadata) -> Dict[str, str]: + def _get_next_observation(self, tool_results: str, _metadata: MultiTurnToolMetadata) -> Dict[str, str]: """Generate observation for next turn or termination.""" return {"role": "environment", "content": f"<tool_result> {tool_results} </tool_result>"}Please confirm whether role="environment" is supported by your sampler/tokenizer. If not, switch to role="user".
373-396: Make call comparison robust to arg ordering/quotingComparing stringified calls via sets is brittle; arg order and quoting can differ while being semantically identical. Canonicalize calls (e.g., sort keys and JSON-serialize) for both model and GT before comparison.
I can provide a helper to parse call strings into (name, normalized_args) and compare structured sets. Want me to add it and refactor _compare_tool_calls and call construction?
Also applies to: 283-300
449-455: Use zip(strict=True) and drop unused enumerate indexSatisfy ruff B905/B007 and catch mismatched batch sizes early.
Apply this diff:
- for i, (message_log, sample_metadata) in enumerate(zip(message_log_batch, processed_metadata)): + for _i, (message_log, sample_metadata) in enumerate(zip(message_log_batch, processed_metadata, strict=True)):
81-83: Minor lint fixes: explicit conversion flags and exception messaging
- Prefer f"{e!s}" over str(e) in f-strings (RUF010).
- Narrow broad excepts where feasible; otherwise keep and log.
Apply this diff:
- logger.warning("Failed to import %s: %s", tool_name, e) + logger.warning("Failed to import %s: %s", tool_name, e) @@ - result = method(**args) + result = method(**args) result_str = str(result) if result is not None else "Success" - return f"[{tool_name}.{func_name}] {result_str}", True - except Exception as e: - return f"[{tool_name}.{func_name}] Error: {str(e)}", False + return f"[{tool_name}.{func_name}] {result_str}", True + except TypeError as e: + return f"[{tool_name}.{func_name}] Error: {e!s}", False + except Exception as e: + return f"[{tool_name}.{func_name}] Error: {e!s}", False @@ - return f"Invalid tool function name format: {str(e)}", False + return f"Invalid tool function name format: {e!s}", False @@ - # Unexpected format – record the raw representation + # Unexpected format - record the raw representation arg_repr = repr(args)Also applies to: 176-185, 382-388
33-43: TypedDict completeness: include turn_metadata in MultiTurnToolMetadataturn_metadata is used but not declared; add it to the TypedDict.
Apply this diff:
class MultiTurnToolMetadata(TypedDict): """Metadata for tracking multi-turn tool state.""" id: str current_turn: int max_turns: int ground_truth: List[List[str]] # GT tool calls per turn user_question_bank: List[List[Dict[str, str]]] # Next user questions model_tool_instances: Dict[str, Any] # Model's tool instances gt_tool_instances: Dict[str, Any] # Ground truth tool instances model_calls_per_turn: List[List[str]] # Model's calls per turn + turn_metadata: Dict[int, Dict[str, Any]] # Per-turn scoring/diagnosticsAlso applies to: 329-331
160-170: Method dispatch ambiguity across toolsDispatch is based solely on method name; if two tools expose the same name, calls may hit the wrong tool. Prefer explicit tool selection (e.g., require tool field in call JSON or "ToolName.method" form) or implement disambiguation.
Would you like a change to accept {"tool": "GorillaFileSystem", "name": "cd", "args": {...}} with strict routing and a backward-compatible fallback?
404-405: Remove debug artifactsLeftover breakpoints/comments should be dropped before merging.
Apply this diff:
- # breakpoint() return "\n".join(tool_results), model_calls_made, turn_success @@ - # breakpoint() return state_score, call_score @@ - # breakpoint() return EnvironmentReturn(Also applies to: 222-223, 488-489
examples/configs/evals/eval.yaml (1)
55-55: Drop the no-op changeLooks like this hunk only adjusts the trailing newline. Please revert it to keep noise out of the history unless there’s an intentional config tweak tied to the PR.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
docs/assets/bfcl_reward_2.pngis excluded by!**/*.pngdocs/assets/bfcl_v3_per_category_rewards.pngis excluded by!**/*.pngdocs/assets/bfcl_val_acc.pngis excluded by!**/*.pnguv.lockis excluded by!**/*.lock
📒 Files selected for processing (26)
docs/guides/multi_turn_bfcl.md(1 hunks)docs/index.md(1 hunks)examples/configs/evals/bfcl_eval.yaml(1 hunks)examples/configs/evals/eval.yaml(1 hunks)examples/configs/grpo_bfclv3.yaml(1 hunks)examples/run_eval.py(3 hunks)examples/run_grpo_bfclv3.py(1 hunks)nemo_rl/algorithms/grpo.py(2 hunks)nemo_rl/data/__init__.py(1 hunks)nemo_rl/data/datasets/eval_datasets/__init__.py(3 hunks)nemo_rl/data/datasets/eval_datasets/bfclv3.py(1 hunks)nemo_rl/data/datasets/response_datasets/__init__.py(3 hunks)nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py(1 hunks)nemo_rl/data/processors.py(2 hunks)nemo_rl/distributed/ray_actor_environment_registry.py(1 hunks)nemo_rl/environments/multi_turn_tool_environment.py(1 hunks)nemo_rl/environments/tools/gorilla_file_system.py(1 hunks)nemo_rl/environments/tools/math_api.py(1 hunks)nemo_rl/environments/tools/message_api.py(1 hunks)nemo_rl/environments/tools/ticket_api.py(1 hunks)nemo_rl/environments/tools/trading_bot.py(1 hunks)nemo_rl/environments/tools/twitter_api.py(1 hunks)nemo_rl/environments/tools/vehicle_control.py(1 hunks)nemo_rl/evals/eval.py(7 hunks)nemo_rl/experience/rollouts.py(2 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/distributed/ray_actor_environment_registry.pynemo_rl/data/__init__.pynemo_rl/algorithms/grpo.pynemo_rl/experience/rollouts.pynemo_rl/environments/multi_turn_tool_environment.pynemo_rl/data/datasets/response_datasets/bfcl_multiturn.pynemo_rl/environments/tools/gorilla_file_system.pyexamples/run_eval.pynemo_rl/evals/eval.pynemo_rl/data/datasets/eval_datasets/__init__.pynemo_rl/environments/tools/twitter_api.pynemo_rl/environments/tools/trading_bot.pynemo_rl/data/datasets/response_datasets/__init__.pynemo_rl/data/datasets/eval_datasets/bfclv3.pynemo_rl/environments/tools/ticket_api.pynemo_rl/environments/tools/math_api.pyexamples/run_grpo_bfclv3.pynemo_rl/environments/tools/vehicle_control.pynemo_rl/environments/tools/message_api.pynemo_rl/data/processors.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/distributed/ray_actor_environment_registry.pynemo_rl/data/__init__.pynemo_rl/algorithms/grpo.pynemo_rl/experience/rollouts.pynemo_rl/environments/multi_turn_tool_environment.pynemo_rl/data/datasets/response_datasets/bfcl_multiturn.pynemo_rl/environments/tools/gorilla_file_system.pynemo_rl/evals/eval.pynemo_rl/data/datasets/eval_datasets/__init__.pynemo_rl/environments/tools/twitter_api.pynemo_rl/environments/tools/trading_bot.pynemo_rl/data/datasets/response_datasets/__init__.pynemo_rl/data/datasets/eval_datasets/bfclv3.pynemo_rl/environments/tools/ticket_api.pynemo_rl/environments/tools/math_api.pynemo_rl/environments/tools/vehicle_control.pynemo_rl/environments/tools/message_api.pynemo_rl/data/processors.py
docs/**/*.md
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section
Files:
docs/index.mddocs/guides/multi_turn_bfcl.md
examples/configs/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml
Files:
examples/configs/grpo_bfclv3.yaml
🧠 Learnings (1)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
Applied to files:
nemo_rl/data/__init__.py
🧬 Code graph analysis (18)
nemo_rl/distributed/ray_actor_environment_registry.py (1)
nemo_rl/distributed/virtual_cluster.py (1)
PY_EXECUTABLES(42-58)
nemo_rl/experience/rollouts.py (2)
tests/unit/environments/test_code_environment.py (1)
tokenizer(85-94)nemo_rl/distributed/batched_data_dict.py (1)
size(793-802)
nemo_rl/environments/multi_turn_tool_environment.py (10)
nemo_rl/distributed/batched_data_dict.py (1)
BatchedDataDict(75-839)nemo_rl/distributed/virtual_cluster.py (1)
PY_EXECUTABLES(42-58)nemo_rl/environments/interfaces.py (2)
EnvironmentInterface(52-88)EnvironmentReturn(26-49)nemo_rl/environments/tools/math_api.py (3)
_load_scenario(12-22)add(170-184)mean(49-64)nemo_rl/environments/tools/message_api.py (1)
_load_scenario(69-87)nemo_rl/environments/tools/trading_bot.py (1)
_load_scenario(159-191)nemo_rl/environments/tools/vehicle_control.py (1)
_load_scenario(81-158)nemo_rl/environments/tools/gorilla_file_system.py (1)
_load_scenario(151-199)nemo_rl/environments/tools/ticket_api.py (1)
_load_scenario(35-47)nemo_rl/environments/tools/twitter_api.py (1)
_load_scenario(29-50)
nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py (1)
nemo_rl/data/interfaces.py (1)
TaskDataSpec(53-86)
nemo_rl/environments/tools/gorilla_file_system.py (2)
nemo_rl/environments/tools/ticket_api.py (1)
_load_scenario(35-47)nemo_rl/environments/tools/twitter_api.py (1)
_load_scenario(29-50)
examples/run_eval.py (3)
nemo_rl/environments/multi_turn_tool_environment.py (1)
MultiTurnToolEnvironment(303-513)nemo_rl/distributed/ray_actor_environment_registry.py (1)
get_actor_python_env(49-64)nemo_rl/environments/math_environment.py (1)
MathEnvironment(222-372)
nemo_rl/evals/eval.py (4)
nemo_rl/experience/rollouts.py (2)
run_async_multi_turn_rollout(760-904)run_multi_turn_rollout(316-531)nemo_rl/distributed/batched_data_dict.py (2)
repeat_interleave(703-724)BatchedDataDict(75-839)nemo_rl/data/llm_message_utils.py (1)
get_keys_from_message_log(126-138)nemo_rl/environments/multi_turn_tool_environment.py (1)
step(426-496)
nemo_rl/data/datasets/eval_datasets/__init__.py (1)
nemo_rl/data/datasets/eval_datasets/bfclv3.py (1)
BFCLDataset(9-65)
nemo_rl/environments/tools/twitter_api.py (2)
nemo_rl/environments/tools/gorilla_file_system.py (1)
_load_scenario(151-199)nemo_rl/environments/tools/ticket_api.py (1)
_load_scenario(35-47)
nemo_rl/environments/tools/trading_bot.py (3)
nemo_rl/environments/tools/math_api.py (1)
_load_scenario(12-22)nemo_rl/environments/tools/message_api.py (1)
_load_scenario(69-87)nemo_rl/environments/tools/vehicle_control.py (1)
_load_scenario(81-158)
nemo_rl/data/datasets/response_datasets/__init__.py (1)
nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py (1)
BFCLMultiturnDataset(39-117)
nemo_rl/data/datasets/eval_datasets/bfclv3.py (2)
nemo_rl/data/interfaces.py (1)
TaskDataSpec(53-86)nemo_rl/data/processors.py (1)
bfcl_processor(232-297)
nemo_rl/environments/tools/ticket_api.py (2)
nemo_rl/environments/tools/gorilla_file_system.py (1)
_load_scenario(151-199)nemo_rl/environments/tools/twitter_api.py (1)
_load_scenario(29-50)
nemo_rl/environments/tools/math_api.py (3)
nemo_rl/environments/tools/message_api.py (1)
_load_scenario(69-87)nemo_rl/environments/tools/trading_bot.py (1)
_load_scenario(159-191)nemo_rl/environments/tools/vehicle_control.py (1)
_load_scenario(81-158)
examples/run_grpo_bfclv3.py (12)
nemo_rl/algorithms/grpo.py (3)
MasterConfig(122-130)grpo_train(561-1030)setup(138-460)nemo_rl/algorithms/utils.py (1)
get_tokenizer(157-282)nemo_rl/data/__init__.py (1)
DataConfig(18-42)nemo_rl/data/datasets/processed_dataset.py (1)
AllTaskProcessedDataset(31-126)nemo_rl/data/datasets/response_datasets/__init__.py (1)
load_response_dataset(35-121)nemo_rl/data/interfaces.py (3)
TaskDataProcessFnCallable(89-100)TaskDataSpec(53-86)DatumSpec(32-40)nemo_rl/distributed/ray_actor_environment_registry.py (1)
get_actor_python_env(49-64)nemo_rl/distributed/virtual_cluster.py (1)
init_ray(84-170)nemo_rl/data/processors.py (1)
bfcl_multiturn_hf_data_processor(299-363)nemo_rl/environments/multi_turn_tool_environment.py (1)
MultiTurnToolEnvironment(303-513)nemo_rl/models/generation/__init__.py (1)
configure_generation_config(24-45)nemo_rl/utils/logger.py (1)
get_next_experiment_dir(1222-1256)
nemo_rl/environments/tools/vehicle_control.py (2)
nemo_rl/environments/tools/message_api.py (1)
_load_scenario(69-87)nemo_rl/environments/tools/trading_bot.py (1)
_load_scenario(159-191)
nemo_rl/environments/tools/message_api.py (3)
nemo_rl/environments/tools/math_api.py (1)
_load_scenario(12-22)nemo_rl/environments/tools/trading_bot.py (1)
_load_scenario(159-191)nemo_rl/environments/tools/vehicle_control.py (1)
_load_scenario(81-158)
nemo_rl/data/processors.py (1)
nemo_rl/data/interfaces.py (2)
TaskDataSpec(53-86)DatumSpec(32-40)
🪛 Ruff (0.13.1)
nemo_rl/experience/rollouts.py
909-909: Undefined name AutoTokenizer
(F821)
945-945: Prefer TypeError exception for invalid type
(TRY004)
945-945: Avoid specifying long messages outside the exception class
(TRY003)
nemo_rl/environments/multi_turn_tool_environment.py
53-61: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
81-81: Do not catch blind exception: Exception
(BLE001)
116-116: Consider moving this statement to an else block
(TRY300)
179-179: Consider moving this statement to an else block
(TRY300)
180-180: Do not catch blind exception: Exception
(BLE001)
181-181: Use explicit conversion flag
Replace with conversion flag
(RUF010)
182-182: Consider moving this statement to an else block
(TRY300)
185-185: Use explicit conversion flag
Replace with conversion flag
(RUF010)
189-189: Unused method argument: is_final_turn
(ARG002)
245-245: Use is and is not for type comparisons, or isinstance() for isinstance checks
(E721)
339-339: Unused method argument: metadata
(ARG002)
382-382: Use explicit conversion flag
Replace with conversion flag
(RUF010)
384-384: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
449-449: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
449-449: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py
106-106: Do not catch blind exception: Exception
(BLE001)
nemo_rl/environments/tools/gorilla_file_system.py
83-85: Avoid specifying long messages outside the exception class
(TRY003)
97-99: Avoid specifying long messages outside the exception class
(TRY003)
195-195: Prefer next(iter(scenario["root"].keys())) over single element slice
Replace with next(iter(scenario["root"].keys()))
(RUF015)
197-197: Prefer next(iter(scenario["root"].keys())) over single element slice
Replace with next(iter(scenario["root"].keys()))
(RUF015)
588-588: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
617-617: f-string without any placeholders
Remove extraneous f prefix
(F541)
729-729: f-string without any placeholders
Remove extraneous f prefix
(F541)
nemo_rl/evals/eval.py
329-329: Avoid specifying long messages outside the exception class
(TRY003)
365-365: Unpacked variable rollout_metrics is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
nemo_rl/environments/tools/twitter_api.py
29-29: Unused method argument: long_context
(ARG002)
77-77: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
77-77: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
nemo_rl/environments/tools/trading_bot.py
189-191: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
327-327: Use explicit conversion flag
Replace with conversion flag
(RUF010)
469-469: Unused method argument: username
(ARG002)
469-469: Unused method argument: password
(ARG002)
nemo_rl/environments/tools/ticket_api.py
35-35: Unused method argument: long_context
(ARG002)
nemo_rl/environments/tools/math_api.py
45-45: Consider moving this statement to an else block
(TRY300)
46-46: Do not catch blind exception: Exception
(BLE001)
113-113: Consider moving this statement to an else block
(TRY300)
166-166: Consider moving this statement to an else block
(TRY300)
277-277: Consider moving this statement to an else block
(TRY300)
examples/run_grpo_bfclv3.py
191-191: Unpacked variable cluster is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
nemo_rl/environments/tools/vehicle_control.py
88-90: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
nemo_rl/environments/tools/message_api.py
69-69: Unused method argument: long_context
(ARG002)
77-77: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
214-214: Prefer next(iter(message.items())) over single element slice
Replace with next(iter(message.items()))
(RUF015)
238-238: Prefer next(iter(message.items())) over single element slice
Replace with next(iter(message.items()))
(RUF015)
287-287: Prefer next(iter(message_data.items())) over single element slice
Replace with next(iter(message_data.items()))
(RUF015)
307-307: Local variable sent_count is assigned to but never used
Remove assignment to unused variable sent_count
(F841)
312-312: Unpacked variable message_content is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
312-312: Prefer next(iter(message_data.items())) over single element slice
Replace with next(iter(message_data.items()))
(RUF015)
nemo_rl/data/processors.py
234-234: Unused function argument: task_data_spec
(ARG001)
236-236: Unused function argument: max_seq_length
(ARG001)
301-301: Unused function argument: task_data_spec
(ARG001)
🪛 markdownlint-cli2 (0.18.1)
docs/guides/multi_turn_bfcl.md
45-45: Link text should be descriptive
(MD059, descriptive-link-text)
135-135: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Post automodel integration comment / Comment on PR
- GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (13)
nemo_rl/algorithms/grpo.py (1)
21-21: LGTM: json import aligned with new loggingImport is required for metadata serialization.
examples/run_eval.py (1)
62-68: Actor environment registry entry exists for MultiTurnToolEnvironment
The ACTOR_ENVIRONMENT_REGISTRY includes"nemo_rl.environments.multi_turn_tool_environment.MultiTurnToolEnvironment", so no changes needed.examples/configs/grpo_bfclv3.yaml (1)
1-1: Defaults merging is supported
load_config_with_inheritancehandles a top-leveldefaultskey (string or list), recursively loading and merging parent configs.docs/guides/multi_turn_bfcl.md (1)
176-176: Confirm docs/index.md was updated to include this guide.Required when adding docs under docs/**/*.md.
As per coding guidelines
nemo_rl/environments/tools/message_api.py (1)
170-196: Fix send_message to use int message_id (after _generate_id change).No structural change beyond using the int return.
Please ensure any downstream code expecting {"new_id": ...} is updated; current file uses the int directly.
examples/configs/evals/bfcl_eval.yaml (1)
2-2: Validate 'defaults' format for your config system.Hydra expects a list for defaults (e.g., defaults: [eval.yaml] or a structured list). If you’re using Hydrated custom parsing this may be fine; otherwise, consider switching to list form.
If using Hydra, consider:
-defaults: "eval.yaml" +defaults: + - eval.yamlPlease confirm the expected format for your config loader.
nemo_rl/distributed/ray_actor_environment_registry.py (1)
45-46: Add registry entry looks correct; verify actor decoration.Mapping MultiTurnToolEnvironment to PY_EXECUTABLES.SYSTEM is consistent with other envs. Please confirm the class is a Ray actor (decorated/constructed via .options/.remote) so this entry is actually used.
nemo_rl/data/datasets/response_datasets/__init__.py (1)
128-129: Export addition LGTM.nemo_rl/experience/rollouts.py (1)
493-504: Hook placement for BFCL next-user question is sensible.Appending user follow-ups only for continuing samples and guarding by metadata is appropriate.
Please confirm user_question_bank entries’ structure matches the assumptions handled in _append_next_user_question (list nesting vs dict).
nemo_rl/evals/eval.py (1)
277-301: Signature change: ensure all call sites pass tokenizer.run_env_eval now accepts tokenizer; confirm examples/run_eval.py and any external callers are updated, else bfcl_multiturn will raise.
docs/index.md (1)
31-31: Guide indexed in main toctreeThanks for wiring the new multi-turn BFCL guide into the guides toctree so it shows up in the rendered docs; this keeps the navigation consistent with the docs guidelines.
pyproject.toml (1)
54-55: Dependency aligns with new dataset loaderAdding
jsonlinesto the core dependency set matches the new BFCL dataset reader’s requirements, so this ensures eval/train runs won’t hit an import error.nemo_rl/data/datasets/eval_datasets/__init__.py (1)
21-110: BFCL eval dataset hook looks goodImporting and routing
bfcl_multiturntoBFCLDataset, plus exporting it via__all__, cleanly integrates the new eval path without disturbing existing datasets.
| if system_content and len(system_content.strip()) > 0: | ||
| sys_prompt: dict[str, str | torch.Tensor] = { | ||
| "role": "system", | ||
| "content": system_content, | ||
| } | ||
| sys = tokenizer.apply_chat_template( | ||
| [cast(dict[str, str], sys_prompt)], | ||
| tokenize=False, | ||
| add_generation_prompt=False, | ||
| add_special_tokens=False, | ||
| ) | ||
| sys_prompt["token_ids"] = tokenizer(sys, return_tensors="pt")["input_ids"][0] | ||
| sys_prompt["content"] = sys | ||
| message_log.append(sys_prompt) | ||
|
|
||
| # User prompt | ||
| # Use custom prompt template if provided, otherwise use the content directly | ||
| final_user_content = user_content | ||
|
|
||
| user_message = {"role": "user", "content": final_user_content} | ||
| message = tokenizer.apply_chat_template( | ||
| [user_message], | ||
| tokenize=False, | ||
| add_generation_prompt=True, | ||
| add_special_tokens=False, | ||
| ) | ||
| user_message["token_ids"] = tokenizer(message, return_tensors="pt")["input_ids"][0] | ||
| user_message["content"] = message | ||
| message_log.append(user_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align tokenization with existing processors; support optional prompt templating.
Two issues:
- Tokenization should pass add_special_tokens=False (consistency with existing processors and to avoid double specials).
- Comment says “Use custom prompt template” but it’s not applied.
Apply:
- # User prompt
- # Use custom prompt template if provided, otherwise use the content directly
- final_user_content = user_content
+ # User prompt
+ # Use custom prompt template if provided, otherwise format the content
+ final_user_content = (
+ task_data_spec.prompt.format(user_content)
+ if getattr(task_data_spec, "prompt", None)
+ else user_content
+ )
@@
- user_message["token_ids"] = tokenizer(message, return_tensors="pt")["input_ids"][0]
+ user_message["token_ids"] = tokenizer(
+ message, return_tensors="pt", add_special_tokens=False
+ )["input_ids"][0]
@@
- sys_prompt["token_ids"] = tokenizer(sys, return_tensors="pt")["input_ids"][0]
+ sys_prompt["token_ids"] = tokenizer(
+ sys, return_tensors="pt", add_special_tokens=False
+ )["input_ids"][0]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if system_content and len(system_content.strip()) > 0: | |
| sys_prompt: dict[str, str | torch.Tensor] = { | |
| "role": "system", | |
| "content": system_content, | |
| } | |
| sys = tokenizer.apply_chat_template( | |
| [cast(dict[str, str], sys_prompt)], | |
| tokenize=False, | |
| add_generation_prompt=False, | |
| add_special_tokens=False, | |
| ) | |
| sys_prompt["token_ids"] = tokenizer(sys, return_tensors="pt")["input_ids"][0] | |
| sys_prompt["content"] = sys | |
| message_log.append(sys_prompt) | |
| # User prompt | |
| # Use custom prompt template if provided, otherwise use the content directly | |
| final_user_content = user_content | |
| user_message = {"role": "user", "content": final_user_content} | |
| message = tokenizer.apply_chat_template( | |
| [user_message], | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| add_special_tokens=False, | |
| ) | |
| user_message["token_ids"] = tokenizer(message, return_tensors="pt")["input_ids"][0] | |
| user_message["content"] = message | |
| message_log.append(user_message) | |
| if system_content and len(system_content.strip()) > 0: | |
| sys_prompt: dict[str, str | torch.Tensor] = { | |
| "role": "system", | |
| "content": system_content, | |
| } | |
| sys = tokenizer.apply_chat_template( | |
| [cast(dict[str, str], sys_prompt)], | |
| tokenize=False, | |
| add_generation_prompt=False, | |
| add_special_tokens=False, | |
| ) | |
| sys_prompt["token_ids"] = tokenizer( | |
| sys, return_tensors="pt", add_special_tokens=False | |
| )["input_ids"][0] | |
| sys_prompt["content"] = sys | |
| message_log.append(sys_prompt) | |
| # User prompt | |
| # Use custom prompt template if provided, otherwise format the content | |
| final_user_content = ( | |
| task_data_spec.prompt.format(user_content) | |
| if getattr(task_data_spec, "prompt", None) | |
| else user_content | |
| ) | |
| user_message = {"role": "user", "content": final_user_content} | |
| message = tokenizer.apply_chat_template( | |
| [user_message], | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| add_special_tokens=False, | |
| ) | |
| user_message["token_ids"] = tokenizer( | |
| message, return_tensors="pt", add_special_tokens=False | |
| )["input_ids"][0] | |
| user_message["content"] = message | |
| message_log.append(user_message) |
🤖 Prompt for AI Agents
In nemo_rl/data/processors.py around lines 248 to 276, the user message
tokenization and templating are inconsistent with the system message handling:
ensure the apply_chat_template call uses a provided custom user template when
present (instead of always using raw user_content) and make all tokenizer(...)
calls pass add_special_tokens=False to avoid injecting special tokens twice;
specifically, if a user prompt template exists, run
tokenizer.apply_chat_template([...], tokenize=False, add_generation_prompt=True,
add_special_tokens=False) to produce the final message string, then compute
token_ids via tokenizer(final_message, return_tensors="pt",
add_special_tokens=False)["input_ids"][0], set user_message["content"] to the
templated message, and append user_message to message_log.
| def adjustClimateControl( | ||
| self, | ||
| temperature: float, | ||
| unit: str = "celsius", | ||
| fanSpeed: int = 50, | ||
| mode: str = "auto", | ||
| ) -> Dict[str, Union[str, float]]: | ||
| """ | ||
| Adjusts the climate control of the vehicle. | ||
| Args: | ||
| temperature (float): The temperature to set in degree. Default to be celsius. | ||
| unit (str): [Optional] The unit of temperature. [Enum]: ["celsius", "fahrenheit"] | ||
| fanSpeed (int): [Optional] The fan speed to set from 0 to 100. Default is 50. | ||
| mode (str): [Optional] The climate mode to set. [Enum]: ["auto", "cool", "heat", "defrost"] | ||
| Returns: | ||
| currentTemperature (float): The current temperature set in degree Celsius. | ||
| climateMode (str): The current climate mode set. | ||
| humidityLevel (float): The humidity level in percentage. | ||
| """ | ||
| if not (0 <= fanSpeed <= 100): | ||
| return {"error": "Fan speed must be between 0 and 100."} | ||
| self.acTemperature = temperature | ||
| if unit == "fahrenheit": | ||
| self.acTemperature = (temperature - 32) * 5 / 9 | ||
| self.fanSpeed = fanSpeed | ||
| self.acMode = mode | ||
| return { | ||
| "currentACTemperature": temperature, | ||
| "climateMode": mode, | ||
| "humidityLevel": self.humidityLevel, | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjustClimateControl(): returned temperature unit is wrong.
Returns input temperature instead of Celsius value; should return the internal Celsius (self.acTemperature).
Apply this diff:
- return {
- "currentACTemperature": temperature,
+ return {
+ "currentACTemperature": self.acTemperature,
"climateMode": mode,
"humidityLevel": self.humidityLevel,
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def adjustClimateControl( | |
| self, | |
| temperature: float, | |
| unit: str = "celsius", | |
| fanSpeed: int = 50, | |
| mode: str = "auto", | |
| ) -> Dict[str, Union[str, float]]: | |
| """ | |
| Adjusts the climate control of the vehicle. | |
| Args: | |
| temperature (float): The temperature to set in degree. Default to be celsius. | |
| unit (str): [Optional] The unit of temperature. [Enum]: ["celsius", "fahrenheit"] | |
| fanSpeed (int): [Optional] The fan speed to set from 0 to 100. Default is 50. | |
| mode (str): [Optional] The climate mode to set. [Enum]: ["auto", "cool", "heat", "defrost"] | |
| Returns: | |
| currentTemperature (float): The current temperature set in degree Celsius. | |
| climateMode (str): The current climate mode set. | |
| humidityLevel (float): The humidity level in percentage. | |
| """ | |
| if not (0 <= fanSpeed <= 100): | |
| return {"error": "Fan speed must be between 0 and 100."} | |
| self.acTemperature = temperature | |
| if unit == "fahrenheit": | |
| self.acTemperature = (temperature - 32) * 5 / 9 | |
| self.fanSpeed = fanSpeed | |
| self.acMode = mode | |
| return { | |
| "currentACTemperature": temperature, | |
| "climateMode": mode, | |
| "humidityLevel": self.humidityLevel, | |
| } | |
| def adjustClimateControl( | |
| self, | |
| temperature: float, | |
| unit: str = "celsius", | |
| fanSpeed: int = 50, | |
| mode: str = "auto", | |
| ) -> Dict[str, Union[str, float]]: | |
| """ | |
| Adjusts the climate control of the vehicle. | |
| Args: | |
| temperature (float): The temperature to set in degree. Default to be celsius. | |
| unit (str): [Optional] The unit of temperature. [Enum]: ["celsius", "fahrenheit"] | |
| fanSpeed (int): [Optional] The fan speed to set from 0 to 100. Default is 50. | |
| mode (str): [Optional] The climate mode to set. [Enum]: ["auto", "cool", "heat", "defrost"] | |
| Returns: | |
| currentTemperature (float): The current temperature set in degree Celsius. | |
| climateMode (str): The current climate mode set. | |
| humidityLevel (float): The humidity level in percentage. | |
| """ | |
| if not (0 <= fanSpeed <= 100): | |
| return {"error": "Fan speed must be between 0 and 100."} | |
| self.acTemperature = temperature | |
| if unit == "fahrenheit": | |
| self.acTemperature = (temperature - 32) * 5 / 9 | |
| self.fanSpeed = fanSpeed | |
| self.acMode = mode | |
| return { | |
| "currentACTemperature": self.acTemperature, | |
| "climateMode": mode, | |
| "humidityLevel": self.humidityLevel, | |
| } |
🤖 Prompt for AI Agents
In nemo_rl/environments/tools/vehicle_control.py around lines 263 to 294, the
adjustClimateControl method returns the input temperature rather than the
internal Celsius value; change the returned "currentACTemperature" to use
self.acTemperature (which is converted to Celsius when unit == "fahrenheit") so
the API always reports the vehicle's temperature in Celsius; ensure you return
the numeric Celsius value (self.acTemperature) not the original temperature
parameter.
| def get_outside_temperature_from_weather_com(self) -> Dict[str, float]: | ||
| """ | ||
| Gets the outside temperature. | ||
| Returns: | ||
| outsideTemperature (float): The outside temperature in degree Celsius. | ||
| """ | ||
| return {"error": 404} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_outside_temperature_from_weather_com(): return shape mismatches doc.
Doc says it returns outsideTemperature, but function returns {"error": 404}. Align to the same shape as the Google variant.
Apply this diff:
- return {"error": 404}
+ return {"outsideTemperature": self._random.uniform(-10.0, 40.0)}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_outside_temperature_from_weather_com(self) -> Dict[str, float]: | |
| """ | |
| Gets the outside temperature. | |
| Returns: | |
| outsideTemperature (float): The outside temperature in degree Celsius. | |
| """ | |
| return {"error": 404} | |
| def get_outside_temperature_from_weather_com(self) -> Dict[str, float]: | |
| """ | |
| Gets the outside temperature. | |
| Returns: | |
| outsideTemperature (float): The outside temperature in degree Celsius. | |
| """ | |
| return {"outsideTemperature": self._random.uniform(-10.0, 40.0)} |
🤖 Prompt for AI Agents
In nemo_rl/environments/tools/vehicle_control.py around lines 306 to 313, the
function get_outside_temperature_from_weather_com currently returns {"error":
404} which does not match the docstring or the Google variant; change it to
return a dictionary with the same shape as the Google implementation, i.e.,
{"outsideTemperature": <float>} where the float is the temperature in Celsius,
and on failure return {"outsideTemperature": None} (or raise an exception) so
callers receive a consistent key.
| else: | ||
| self.parkingBrakeStatus = "released" | ||
| self._parkingBrakeForce = 0.0 | ||
| self._slopeAngle = 10.0 | ||
| if self.long_context: | ||
| return { | ||
| "parkingBrakeInstruction": PARKING_BRAKE_INSTRUCTION, | ||
| "parkingBrakeStatus": "released", | ||
| "_parkingBrakeForce": 0.0, | ||
| "_slopeAngle": 10.0, | ||
| } | ||
| return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 10.0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
activateParkingBrake(release): slope angle should reset when released.
On release, _slopeAngle remains 10.0; set to 0.0 for consistency.
Apply this diff:
- self._slopeAngle = 10.0
+ self._slopeAngle = 0.0
@@
- "_slopeAngle": 10.0,
+ "_slopeAngle": 0.0,
@@
- return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 10.0}
+ return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 0.0}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else: | |
| self.parkingBrakeStatus = "released" | |
| self._parkingBrakeForce = 0.0 | |
| self._slopeAngle = 10.0 | |
| if self.long_context: | |
| return { | |
| "parkingBrakeInstruction": PARKING_BRAKE_INSTRUCTION, | |
| "parkingBrakeStatus": "released", | |
| "_parkingBrakeForce": 0.0, | |
| "_slopeAngle": 10.0, | |
| } | |
| return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 10.0} | |
| else: | |
| self.parkingBrakeStatus = "released" | |
| self._parkingBrakeForce = 0.0 | |
| self._slopeAngle = 0.0 | |
| if self.long_context: | |
| return { | |
| "parkingBrakeInstruction": PARKING_BRAKE_INSTRUCTION, | |
| "parkingBrakeStatus": "released", | |
| "_parkingBrakeForce": 0.0, | |
| "_slopeAngle": 0.0, | |
| } | |
| return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 0.0} |
🤖 Prompt for AI Agents
In nemo_rl/environments/tools/vehicle_control.py around lines 411 to 422, when
releasing the parking brake the code sets self._slopeAngle = 10.0 and returns
"_slopeAngle": 10.0; change both to 0.0 so slope angle is reset on release.
Update the assignment to self._slopeAngle = 0.0 and adjust the returned
dictionaries in both the long_context and non-long_context branches to include
"_slopeAngle": 0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
|
Thank you for your hard work on this. I really appreciate the effort you put into it. Would you please put some more test result in PR docs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document might be better placed as a chapter within docs/guides/environments.md.
| grpo: | ||
| num_prompts_per_step: 16 | ||
| num_generations_per_prompt: 8 | ||
| max_rollout_turns: 4 # for multi-turn rollouts. Math Environments just have 1 turn (answering the question) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line's comment can be removed.
| @@ -0,0 +1,64 @@ | |||
| defaults: "grpo_math_1B.yaml" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this files already set default yaml, we can only leave neccessary configs in this file. Some configurations that duplicate those in the default YAML can be removed, such as max_num_steps and so on.
| } | ||
| ).remote(env_configs["math"]) | ||
|
|
||
| bfcl_cfg = env_configs.get("bfcl_multiturn", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better not use default value like this. Refer to #863
| ).remote(env_configs["math"]) | ||
|
|
||
| bfcl_cfg = env_configs.get("bfcl_multiturn", {}) | ||
| if bool(bfcl_cfg.get("enable", False)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Same with the default value. Refer to [Chore] Clean up config defaults throughout codebase #863
- Would you please also add the situation for other env?
| if "generation" in master_config and "vllm_cfg" in master_config["generation"]: | ||
| max_seq_len = master_config["generation"]["vllm_cfg"].get("max_model_len", 8192) | ||
| else: | ||
| max_seq_len = 8192 # fallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We avoid setting default value in code. Here can add an assert to inform user to set max_model_len
| max_seq_len=max_seq_len, | ||
| max_rollout_turns=max_rollout_turns, | ||
| greedy=False, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if it’s appropriate to call the rollout interface directly here. Should we consider encapsulating a multiturn interface inside vllm_generation.py? What do you think @yuki-97?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think call the rollout interface directly is ok since it's multi-turn, and I think this implementation is ok for now since we didn't have a multi-turn rollout support for text-in-text-out format (vllm_generation.generate_text).
I suggest we can use this implementation for multi-turn in eval first, and discuss how to or if we need to keep the text-in-text-out format later.
| ) | ||
|
|
||
| # accuracy = sum(rewards) / len(rewards) | ||
| # print("accuracy", accuracy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines can be removed.
| """Initialize tool instances.""" | ||
| tools = {} | ||
| for tool_name in tool_names: | ||
| if tool_name in self.TOOL_CLASS_MAPPING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be an assert if tool not in pre-defined list to tell users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slikhite-1 thanks for supporting this! left some comments.
I also found this PR failed at copyright and lint.
- for copyright: can you add copyright in
nemo_rl/data/datasets/eval_datasets/bfclv3.pyand new files undernemo_rl/environments/tools? - for lint: can you rebase main branch and run
pre-commit run --all-fileslocally? and there seem to be a lot of missing new line at the end of the files, can you add them?
| max_model_len: ${policy.max_total_sequence_length} | ||
| data: | ||
| dataset_name: "bfcl_multiturn" | ||
| filter_long_samples: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm missing something. is this param filter_long_samples used anywhere?
| "mlflow", | ||
| "nvidia-nvshmem-cu12", # for deep_ep build | ||
| "swanlab", | ||
| "jsonlines", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since pyproject.toml is changed, uv.lock should be changed. could you also push the update of uv.lock?
This PR implements an end-to-end example for training a model on multi-turn tool calling using GRPO with the BFCLv3 dataset. The example leverages preprocessed data uploaded to HuggingFace, where preprocessing includes adding tool descriptions and formatting. It also provides evaluation scripts to compare trained and untrained models, along with a README that documents the example in detail, and results on Qwen 2.5 3B
Summary by CodeRabbit