Skip to content

Conversation

@slikhite-1
Copy link
Contributor

@slikhite-1 slikhite-1 commented Sep 25, 2025

This PR implements an end-to-end example for training a model on multi-turn tool calling using GRPO with the BFCLv3 dataset. The example leverages preprocessed data uploaded to HuggingFace, where preprocessing includes adding tool descriptions and formatting. It also provides evaluation scripts to compare trained and untrained models, along with a README that documents the example in detail, and results on Qwen 2.5 3B

Summary by CodeRabbit

  • New Features
    • Introduced multi-turn tool-calling environment with reward-based rollouts.
    • Added support for BFCL v3 multi-turn datasets for training and evaluation.
    • Provided new built-in tools (filesystem, math, messaging, ticketing, trading, social, vehicle control).
    • Enabled multi-turn evaluation and GRPO training via new scripts and configs.
    • Improved logging with serialized environment metadata.
  • Documentation
    • Added a comprehensive guide for multi-turn tool calling; linked from the guides index.
  • Chores
    • Included a new dependency for JSON Lines data processing.

Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
@slikhite-1 slikhite-1 requested review from a team as code owners September 25, 2025 19:15
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 25, 2025
@slikhite-1 slikhite-1 changed the title Multi-turn tool calling on BFCLv3 dataset feat: Multi-turn tool calling on BFCLv3 dataset Sep 25, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 25, 2025

📝 Walkthrough

Walkthrough

Adds BFCL v3 multi-turn tool-calling capability: new environment, tool APIs, datasets, processors, configs, training and eval scripts. Extends rollout/eval control flow for multi-turn episodes, adds logging of env metadata, and updates distributed registry. Includes documentation and dependency addition.

Changes

Cohort / File(s) Summary
Documentation
docs/guides/multi_turn_bfcl.md, docs/index.md
New guide for multi-turn tool calling with BFCL v3; index updated to link the guide.
Configs: GRPO and Eval
examples/configs/grpo_bfclv3.yaml, examples/configs/evals/bfcl_eval.yaml, examples/configs/evals/eval.yaml
New GRPO BFCLv3 training config and BFCL eval config; minor no-op edit to base eval config.
Run scripts
examples/run_grpo_bfclv3.py, examples/run_eval.py
New GRPO training script for BFCL multiturn; eval script now conditionally uses MultiTurnToolEnvironment and passes tokenizer; function signatures updated accordingly.
Algorithms: GRPO logging
nemo_rl/algorithms/grpo.py
Adds JSON serialization of per-sample extra_env_info into log_data.metadata.
Eval flow
nemo_rl/evals/eval.py
Adds multi-turn evaluation branch, tokenizer plumbing, rollout integration, and data collection for rewards and logs.
Rollouts
nemo_rl/experience/rollouts.py
Adds BFCL helper to append next user question during multi-turn rollouts; integrates into continuation path.
Distributed registry
nemo_rl/distributed/ray_actor_environment_registry.py
Registers MultiTurnToolEnvironment to system Python executable.
Environment: Multi-turn tools
nemo_rl/environments/multi_turn_tool_environment.py
New multi-turn tool environment with turn parsing, tool execution, rewards, and Ray actor support.
Tool APIs
nemo_rl/environments/tools/gorilla_file_system.py, .../math_api.py, .../message_api.py, .../ticket_api.py, .../trading_bot.py, .../twitter_api.py, .../vehicle_control.py
Adds multiple simulated tool APIs with stateful operations; vehicle_control.py updates method return signatures to structured dicts.
Datasets: Eval
nemo_rl/data/datasets/eval_datasets/__init__.py, .../bfclv3.py
Adds BFCLDataset and wiring for eval dataset loading from HF Hub.
Datasets: Response
nemo_rl/data/datasets/response_datasets/__init__.py, .../bfcl_multiturn.py
Adds BFCLMultiturnDataset loading train/val JSONL from HF Hub with formatted output.
Processors
nemo_rl/data/processors.py
Adds BFCL single-turn and multi-turn processors with chat templating, tokenization, and env metadata passthrough.
Data config
nemo_rl/data/__init__.py
Adds optional filter_long_samples to DataConfig.
Project dependencies
pyproject.toml
Adds jsonlines dependency.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant RunScript as run_grpo_bfclv3.py
  participant Ray as Ray Cluster
  participant Policy as GRPO Policy
  participant Env as MultiTurnToolEnvironment
  participant Tools as ToolManager/APIs

  User->>RunScript: Launch with grpo_bfclv3.yaml
  RunScript->>Ray: init()
  RunScript->>Policy: setup(model, tokenizer, gen_cfg)
  RunScript->>Env: create actors (bfcl_multiturn, max_turns)
  loop Training Steps
    Policy->>Env: rollout request (batch)
    Env->>Tools: load scenario (model/GT instances)
    loop per turn (<= max_turns)
      Policy->>Env: assistant response
      Env->>Env: parse <tool> call, validate
      Env->>Tools: execute model and GT calls
      Env-->>Policy: observation, reward fragment, metadata
      Policy->>Policy: GRPO update (rewards, baseline)
    end
    Policy->>RunScript: log rewards + metadata (JSON)
  end
Loading
sequenceDiagram
  autonumber
  participant User
  participant EvalScript as run_eval.py
  participant Loader as load_response_dataset
  participant Dataloader
  participant VLLM as vLLM Generation
  participant Env as MultiTurnToolEnvironment

  User->>EvalScript: Run with bfcl_eval.yaml
  EvalScript->>Loader: load_response_dataset("bfcl_multiturn")
  Loader-->>EvalScript: dataset (train/val)
  EvalScript->>Env: create if bfcl_multiturn.enable
  EvalScript->>Dataloader: build with tokenizer
  alt Multi-turn enabled
    loop Batches
      EvalScript->>VLLM: generate turns (rollout)
      VLLM-->>Env: assistant messages
      Env-->>EvalScript: rewards, logs, extra_env_info
    end
  else Single-turn
    EvalScript->>VLLM: generate once
  end
  EvalScript-->>User: prompts, responses, rewards
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

Suggested labels

documentation, r0.4.0

Suggested reviewers

  • odelalleau
  • ashors1

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning The PR introduces a large new feature set, including multi-turn tool environments, datasets, processors, and training scripts, which clearly qualifies as a major change. However, neither the PR description nor the linked documentation provide concrete testing or evaluation results—there are no logs, metrics summaries, or performance comparisons demonstrating successful runs or guarding against regressions. Because the required testing information is missing for such a substantial update, the custom check fails. Please update the PR description with the relevant validation details, such as commands executed, logs, metrics, or evaluation results (especially any convergence or performance figures), so reviewers can confirm the new functionality has been tested. Once those results are documented, this check can pass.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title succinctly captures the core feature addition by indicating support for multi-turn tool calling on the BFCLv3 dataset, matching the pull request’s primary objective. It is concise, descriptive, and free of extraneous details or vague terminology. The use of the “feat:” prefix follows conventional commit style without introducing noise. Team members scanning the history will immediately understand the main enhancement delivered by this changeset.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 42

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_rl/evals/eval.py (1)

443-452: Fix cons@k for multi-turn or gate it.

env_return is undefined in the multi-turn branch; this crashes when metric == "cons@k".

-        if metric == "pass@k":
+        if metric == "pass@k":
             score += eval_pass_k(rewards, num_tests_per_prompt, k_value)
         elif metric == "cons@k":
-            extracted_answers = env_return.answers
-            score += eval_cons_k(
-                rewards, num_tests_per_prompt, k_value, extracted_answers
-            )
+            if master_config.get("env", {}).get("bfcl_multiturn", {}).get("enable", False):
+                raise NotImplementedError("cons@k is not supported for multi-turn evaluation.")
+            extracted_answers = env_return.answers
+            score += eval_cons_k(rewards, num_tests_per_prompt, k_value, extracted_answers)
         else:
             raise ValueError(f"Invalid metric: {metric}")
🧹 Nitpick comments (47)
examples/run_eval.py (1)

54-55: Stop returning tokenizer from setup_data to avoid redundancy

setup_data receives the tokenizer and returns it unchanged, leading to unnecessary plumbing and potential confusion.

Apply these diffs:

-def setup_data(tokenizer: AutoTokenizer, data_config, env_configs):
+def setup_data(tokenizer: AutoTokenizer, data_config, env_configs):
@@
-    dataset = AllTaskProcessedDataset(
+    dataset = AllTaskProcessedDataset(
         dataset=rekeyed_ds,
         tokenizer=tokenizer,
         default_task_data_spec=base_dataset.task_spec,
         task_data_processors=base_dataset.processor,
         max_seq_length=data_config["max_input_seq_length"],
     )
-
-    return dataset, env, tokenizer
+    return dataset, env

And update the call site:

-    (
-        dataset,
-        env,
-        tokenizer,
-    ) = setup_data(tokenizer, config["data"], config["env"])
+    dataset, env = setup_data(tokenizer, config["data"], config["env"])

Also applies to: 79-86, 87-88

nemo_rl/algorithms/grpo.py (1)

1710-1715: Add metadata logging in async path for parity

The async training path omits metadata logging, leading to inconsistent logs across modes.

Apply this diff:

             log_data = {"content": flat_messages["content"]}
             log_data["rewards"] = rewards.tolist()
             log_data["generation_logprobs"] = train_data["generation_logprobs"].tolist()
             log_data["prev_logprobs"] = train_data["prev_logprobs"].tolist()
             log_data["input_lengths"] = input_lengths.tolist()
+            # Include environment metadata (normalized to rewards length)
+            try:
+                extra_env_info = repeated_batch["extra_env_info"]
+            except KeyError:
+                extra_env_info = [None] * len(rewards)
+            if not isinstance(extra_env_info, (list, tuple)):
+                extra_env_info = [extra_env_info] * len(rewards)
+            else:
+                extra_env_info = list(extra_env_info)
+            if len(extra_env_info) < len(rewards):
+                extra_env_info.extend([None] * (len(rewards) - len(extra_env_info)))
+            elif len(extra_env_info) > len(rewards):
+                extra_env_info = extra_env_info[: len(rewards)]
+            log_data["metadata"] = [
+                json.dumps(m, default=str) if m is not None else "" for m in extra_env_info
+            ]
             logger.log_batched_dict_as_jsonl(log_data, f"train_data_step{step}.jsonl")
examples/configs/grpo_bfclv3.yaml (1)

3-15: Add brief inline docs for new keys

Add one-line comments for grpo.max_rollout_turns and env.bfcl_multiturn.num_workers to guide users.

Also applies to: 58-61

nemo_rl/data/datasets/eval_datasets/bfclv3.py (1)

25-37: Use logging instead of prints for library code

Switch prints to a logger to avoid noisy stdout in library usage.

nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py (3)

49-53: Fix docstring: remove nonexistent split arg

Docstring mentions split that isn’t in the signature.

Apply this diff:

-        """Initialize the BFCL Multiturn dataset.
-
-        Args:
-            split: Split to load ("train" or "validation") 
-            seed: Random seed for reproducibility
-            prompt_file: Optional prompt file path
-        """
+        """Initialize the BFCL Multiturn dataset.
+
+        Args:
+            seed: Random seed for reproducibility.
+            prompt_file: Optional prompt file path for prompt injection.
+        """

16-20: Catch specific exceptions instead of broad Exception

Broad except masks real issues. Narrow to huggingface_hub and I/O errors. Also import the exception types.

Apply these diffs:

-from huggingface_hub import hf_hub_download
+from huggingface_hub import (
+    hf_hub_download,
+    HfHubHTTPError,
+    EntryNotFoundError,
+    RepositoryNotFoundError,
+)
-        except Exception as e:
+        except (HfHubHTTPError, EntryNotFoundError, RepositoryNotFoundError, OSError) as e:
             print(f"❌ Error loading training data: {e}")
             raise
-        except Exception as e:
+        except (HfHubHTTPError, EntryNotFoundError, RepositoryNotFoundError, OSError) as e:
             print(f"Could not load validation data: {e}")
             print("Continuing with training data only...")

[Based on learnings from static analysis hints]

Also applies to: 85-87, 106-109


64-67: Prefer logging to prints for dataset loaders

Replace prints with a logger to keep library quiet under normal use.

nemo_rl/data/processors.py (3)

291-297: Avoid adding non‑declared keys to DatumSpec; use extra for dataset.

DatumSpec does not include dataset. Store it under extra to stay within the TypedDict contract.

-    # Add task_name and dataset if present
+    # Add task_name and dataset if present
     if "task_name" in datum_dict:
         output["task_name"] = datum_dict["task_name"]
-    if "dataset" in datum_dict:
-        output["dataset"] = datum_dict["dataset"]
+    if "dataset" in datum_dict:
+        output["__extra__"] = {**output.get("__extra__", {}), "dataset": datum_dict["dataset"]}

354-363: Store dataset under extra rather than top-level DatumSpec key.

Keep DatumSpec schema consistent.

     output: DatumSpec = {
         "message_log": message_log,
         "length": length,
         "extra_env_info": extra_env_info,
         "loss_multiplier": loss_multiplier,
         "idx": idx,
         "task_name": datum_dict["task_name"],
-        "dataset": datum_dict["dataset"],
     }
+    # Keep additional fields under __extra__
+    output["__extra__"] = {**output.get("__extra__", {}), "dataset": datum_dict.get("dataset")}

232-238: Silence ARG001 or purposefully use parameters.

Ruff flags task_data_spec and/or max_seq_length as unused. You’ve now used max_seq_length; if task_data_spec remains unused, either underscore-prefix it or add a brief comment to indicate it’s part of the processor interface.

nemo_rl/environments/tools/ticket_api.py (2)

35-47: Use or store long_context to satisfy interface and silence lints.

Store the long_context argument like other tools.

 def _load_scenario(self, scenario: dict, long_context=False) -> None:
@@
-        DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE)
+        DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE)
+        self.long_context = long_context

141-171: Consider validating editable fields more strictly (status domain).

Optionally restrict status to a known set (e.g., {"Open","Closed","Resolved"}) and priority to 1..5 as in create_ticket.

docs/guides/multi_turn_bfcl.md (4)

5-5: Use descriptive link text.

Replace “this” with a descriptive label.

-We use tool APIs from [this](https://github.com/bespokelabsai/verifiers/tree/main/verifiers/tools/bfcl_tools) repository.
+We use tool APIs from the [Bespoke Labs Verifiers BFCL tools](https://github.com/bespokelabsai/verifiers/tree/main/verifiers/tools/bfcl_tools) repository.

10-11: Minor grammar fix for wandb sentence.

-You will have to login to wandb/ set you wandb API_KEY to get the logs in wandb.
+You must log in to Weights & Biases and set your WANDB_API_KEY to enable logging.

135-139: Specify language for fenced code block.

Add a language identifier to satisfy MD040.

-```
+```text
 Turn Reward = 0.5 × state_score + 0.5 × call_score
 Total Episode Reward = (Turn 1 + Turn 2 + ... + Turn N) / max_turns

---

`160-166`: **Specify language for fenced code block.**

Add a language identifier to satisfy MD040.



```diff
-```
+```bash
 # Example for a GRPO checkpoint at step 170
 uv run python examples/converters/convert_dcp_to_hf.py \
     --config results/grpo/step_170/config.yaml \
     --dcp-ckpt-path results/grpo/step_170/policy/weights/ \
     --hf-ckpt-path results/grpo/hf

</blockquote></details>
<details>
<summary>nemo_rl/environments/tools/twitter_api.py (2)</summary><blockquote>

`29-51`: **Use or store long_context to satisfy interface and silence lints.**



```diff
 def _load_scenario(self, scenario: dict, long_context=False) -> None:
@@
-        DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE)
+        DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE)
+        self.long_context = long_context

166-172: Require authentication for mention to be consistent with other actions.

     def mention(self, tweet_id: int, mentioned_usernames: List[str]) -> Dict[str, str]:
@@
-        if tweet_id not in self.tweets:
+        if not self.authenticated:
+            return {"error": "User not authenticated. Please authenticate before mentioning users."}
+        if tweet_id not in self.tweets:
             return {"error": f"Tweet with ID {tweet_id} not found."}
nemo_rl/environments/tools/message_api.py (5)

69-88: Use or store long_context; seed usage is fine.

Store the long_context parameter to align with other tools and silence lint.

     def _load_scenario(self, scenario: dict, long_context=False) -> None:
@@
-        DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE)
+        DEFAULT_STATE_COPY = deepcopy(DEFAULT_STATE)
+        self.long_context = long_context

224-244: Iterate dicts idiomatically in view_messages_sent.

Minor cleanups; use next(iter(...)).

-        for message in self.inbox:
-            receiver, message_content = list(message.items())[0]
+        for message in self.inbox:
+            receiver, message_content = next(iter(message.items()))

245-266: Typos and minor robustness in add_contact.

  • Fix “User ID exists” check redundancy is harmless; this is fine. Optional nit: clarify error messages.

268-296: Iterate dicts idiomatically in search_messages.

Also minor cleanup.

-        for message_data in self.inbox:
-            receiver_id, message_content = list(message_data.items())[0]
+        for message_data in self.inbox:
+            receiver_id, message_content = next(iter(message_data.items()))
             if keyword_lower in message_content.lower():

142-160: Minor docstring fixes.

Typos (“messeage”) and clarity; optional.

nemo_rl/environments/tools/math_api.py (1)

49-65: Unify return type annotations across methods that can return errors.

Methods like mean, standard_deviation, divide, round_number, percentage, min_value, max_value, sum_values may return {"error": str} but are annotated as Dict[str, float]. Update signatures to Dict[str, Union[float, str]] (or define a TypedDict) to match behavior and prevent type confusion downstream.

As per coding guidelines

Also applies to: 66-84, 170-238, 298-317, 318-335, 336-352, 353-369, 370-385

nemo_rl/environments/tools/gorilla_file_system.py (4)

195-199: Simplify root key extraction.

Use next(iter(...)) to avoid repeated key extraction and improve clarity.

Apply this diff:

-        if "root" in scenario:
-            root_dir = Directory(list(scenario["root"].keys())[0], None)
-            self.root = self._load_directory(
-                scenario["root"][list(scenario["root"].keys())[0]]["contents"], root_dir
-            )
+        if "root" in scenario:
+            root_key = next(iter(scenario["root"]))
+            root_dir = Directory(root_key, None)
+            self.root = self._load_directory(
+                scenario["root"][root_key]["contents"], root_dir
+            )

586-593: diff(): add strict=False to zip for clarity and future safety.

Explicit strict parameter aids readability and avoids silent truncation if inputs diverge.

Apply this diff:

-                diff_lines = [
-                    f"- {line1}\n+ {line2}"
-                    for line1, line2 in zip(content1, content2)
-                    if line1 != line2
-                ]
+                diff_lines = [
+                    f"- {line1}\n+ {line2}"
+                    for line1, line2 in zip(content1, content2, strict=False)
+                    if line1 != line2
+                ]

615-621: Remove f-strings without placeholders.

Minor style fix; keeps Ruff happy.

Apply this diff:

-            return {
-                "error": f"mv: no path allowed in destination. Only file name and folder name is supported for this operation."
-            }
+            return {
+                "error": "mv: no path allowed in destination. Only file name and folder name is supported for this operation."
+            }
@@
-            return {
-                "error": f"cp: don't allow path in destination. Only file name and folder name is supported for this operation."
-            }
+            return {
+                "error": "cp: don't allow path in destination. Only file name and folder name is supported for this operation."
+            }

Also applies to: 727-733


395-425: find(): 'path' arg is ignored for traversal; only used as a prefix.

Start directory should honor 'path' (relative or absolute) via _navigate_to_directory(). Current behavior is misleading.

Proposed change (outline):

  • Resolve target_dir = _navigate_to_directory(path)
  • If error dict, return it
  • Use that as the root for recursive_search and base_path based on resolved path
nemo_rl/environments/tools/trading_bot.py (1)

698-712: add_to_watchlist(): response key and docstring mismatch.

Docstring says it returns the symbol added; implementation returns the whole watchlist under the key "symbol". Prefer returning {"watchlist": self.watch_list} or {"symbol": stock} consistently.

Proposed change:

-        return {"symbol": self.watch_list}
+        return {"watchlist": self.watch_list}
nemo_rl/environments/tools/vehicle_control.py (2)

464-498: setCruiseControl(): unit mismatch in docstring.

Doc mentions m/h and return says km/h. Clarify and use a consistent unit in docstring and response key to avoid confusion.

Proposed docstring edit: replace "m/h" with "km/h" (or vice versa), and ensure returned currentSpeed matches. Based on learnings


50-175: Method naming style deviates from snake_case.

Public methods use camelCase (e.g., startEngine). Repo guidelines prefer snake_case, but renaming may break datasets/tool calls. Consider leaving as-is for compatibility and documenting the exception.

As per coding guidelines

Also applies to: 215-700

nemo_rl/data/datasets/response_datasets/__init__.py (1)

19-21: Pass seed (and optionally prompt_file) into BFCLMultiturnDataset for reproducibility.

load_response_dataset consistently forwards seed to datasets. Align this branch and avoid relying on in-code defaults inside the dataset class.

Apply:

-    elif dataset_name == "bfcl_multiturn":
-        base_dataset: Any = BFCLMultiturnDataset()
+    elif dataset_name == "bfcl_multiturn":
+        base_dataset: Any = BFCLMultiturnDataset(
+            seed=seed,
+            prompt_file=data_config.get("prompt_file"),
+        )

Also applies to: 91-94

examples/run_grpo_bfclv3.py (3)

78-84: Treat optional config keys as optional (no in-code defaults).

DataConfig marks prompt_file and system_prompt_file as optional; access via .get to avoid KeyError while still passing None when missing.

As per coding guidelines

-    task_spec = TaskDataSpec(
-        task_name="bfcl_multiturn",
-        prompt_file=data_config["prompt_file"],
-        system_prompt_file=data_config["system_prompt_file"],
-    )
+    task_spec = TaskDataSpec(
+        task_name="bfcl_multiturn",
+        prompt_file=data_config.get("prompt_file"),
+        system_prompt_file=data_config.get("system_prompt_file"),
+    )

121-139: Don’t proceed with RL without an environment; fail fast.

Mapping to None will crash later. Make the requirement explicit if env is disabled/missing.

-    if not task_to_env:
-        # You can set up a default environment here if needed
-        task_to_env = defaultdict(lambda: None)
-        task_to_env[task_name] = None
+    if not task_to_env:
+        raise ValueError(
+            "bfcl_multiturn environment is not enabled/configured. "
+            "Please set env.bfcl_multiturn.enable=true and provide its config."
+        )

189-201: Silence unused ‘cluster’ binding.

cluster is unpacked but unused. Prefix with underscore to satisfy linters.

Based on static analysis hints

-    (
-        policy,
-        policy_generation,
-        cluster,
+    (
+        policy,
+        policy_generation,
+        _cluster,
         dataloader,
         val_dataloader,
         loss_fn,
         logger,
         checkpointer,
         grpo_state,
         master_config,
     ) = setup(config, tokenizer, dataset, val_dataset)
nemo_rl/evals/eval.py (1)

353-374: Prefix unused rollout_metrics to satisfy linters.

Variable is not used; avoid linter noise.

Based on static analysis hints

-                batch, rollout_metrics = run_async_multi_turn_rollout(
+                batch, _rollout_metrics = run_async_multi_turn_rollout(
@@
-                batch, rollout_metrics = run_multi_turn_rollout(
+                batch, _rollout_metrics = run_multi_turn_rollout(
nemo_rl/environments/multi_turn_tool_environment.py (10)

53-61: Logging and typing polish in ToolManager

  • Annotate TOOL_CLASS_MAPPING as ClassVar to satisfy ruff RUF012.
  • Replace print with logger.warning/exception.
  • Define a module logger once.

Apply this diff:

-from typing import Any, Dict, List, Optional, Tuple, TypedDict, Set
+from typing import Any, Dict, List, Optional, Tuple, TypedDict, Set, ClassVar
@@
 import ray
 
+logger = logging.getLogger(__name__)
@@
-    TOOL_CLASS_MAPPING = {
+    TOOL_CLASS_MAPPING: ClassVar[dict[str, str]] = {
@@
-                except Exception as e:
-                    print(f"Failed to initialize {tool_name}: {e}")
+                except ModuleNotFoundError as e:
+                    logger.warning("Failed to import %s: %s", tool_name, e)
+                except AttributeError as e:
+                    logger.warning("Tool class missing in module for %s: %s", tool_name, e)
+                except Exception as e:
+                    logger.exception("Failed to initialize %s: %s", tool_name, e)

Also applies to: 15-23, 81-83


85-120: Parse all blocks; don’t drop later ones

Currently only the first ... block is parsed. Aggregate calls from all tool tags.

Apply this diff:

-        tool_matches = re.findall(tool_tag_pattern, assistant_response, re.DOTALL)
-        if tool_matches:
-            # Parse JSON array inside tool tags
-            try:
-                tool_content = tool_matches[0].strip()
-                tool_calls_raw = json.loads(tool_content)
-
-                tool_calls: List[Dict[str, Any]] = []
-                if isinstance(tool_calls_raw, list):
-                    for item in tool_calls_raw:
+        tool_matches = re.findall(tool_tag_pattern, assistant_response, re.DOTALL)
+        if tool_matches:
+            tool_calls: List[Dict[str, Any]] = []
+            # Parse JSON arrays inside each tool tag
+            for tool_content in tool_matches:
+                try:
+                    tool_calls_raw = json.loads(tool_content.strip())
+                except json.JSONDecodeError:
+                    continue
+                if isinstance(tool_calls_raw, list):
+                    for item in tool_calls_raw:
                         if isinstance(item, dict):
                             # Validate tool call format
                             validated_item = self._validate_tool_call_format(item)
                             if validated_item:
                                 tool_calls.append(validated_item)
                         elif isinstance(item, str):
                             try:
                                 parsed_item = json.loads(item)
                                 if isinstance(parsed_item, dict):
                                     # Validate tool call format
                                     validated_item = self._validate_tool_call_format(parsed_item)
                                     if validated_item:
                                         tool_calls.append(validated_item)
                             except json.JSONDecodeError:
                                 # Skip malformed entries; they'll be handled later.
                                 continue
-                return tool_calls
-            except json.JSONDecodeError:
-                return []
+            return tool_calls
         return []

245-247: Use identity for exact type equality (ruff E721)

Prefer is/is not for exact class match.

Apply this diff:

-            if type(model_instance) != type(gt_instance):
+            if type(model_instance) is not type(gt_instance):
                 continue

339-342: Unused parameter and role naming – verify downstream expectations

  • metadata isn’t used; rename to underscore to avoid lint ARG002.
  • Verify that downstream components accept role="environment" for observations; many pipelines expect role="user".

Apply this diff:

-    def _get_next_observation(self, tool_results: str, metadata: MultiTurnToolMetadata) -> Dict[str, str]:
+    def _get_next_observation(self, tool_results: str, _metadata: MultiTurnToolMetadata) -> Dict[str, str]:
         """Generate observation for next turn or termination."""
         return {"role": "environment", "content": f"<tool_result> {tool_results} </tool_result>"}

Please confirm whether role="environment" is supported by your sampler/tokenizer. If not, switch to role="user".


373-396: Make call comparison robust to arg ordering/quoting

Comparing stringified calls via sets is brittle; arg order and quoting can differ while being semantically identical. Canonicalize calls (e.g., sort keys and JSON-serialize) for both model and GT before comparison.

I can provide a helper to parse call strings into (name, normalized_args) and compare structured sets. Want me to add it and refactor _compare_tool_calls and call construction?

Also applies to: 283-300


449-455: Use zip(strict=True) and drop unused enumerate index

Satisfy ruff B905/B007 and catch mismatched batch sizes early.

Apply this diff:

-        for i, (message_log, sample_metadata) in enumerate(zip(message_log_batch, processed_metadata)):
+        for _i, (message_log, sample_metadata) in enumerate(zip(message_log_batch, processed_metadata, strict=True)):

81-83: Minor lint fixes: explicit conversion flags and exception messaging

  • Prefer f"{e!s}" over str(e) in f-strings (RUF010).
  • Narrow broad excepts where feasible; otherwise keep and log.

Apply this diff:

-                    logger.warning("Failed to import %s: %s", tool_name, e)
+                    logger.warning("Failed to import %s: %s", tool_name, e)
@@
-                    result = method(**args)
+                    result = method(**args)
                     result_str = str(result) if result is not None else "Success"
-                    return f"[{tool_name}.{func_name}] {result_str}", True
-                except Exception as e:
-                    return f"[{tool_name}.{func_name}] Error: {str(e)}", False
+                    return f"[{tool_name}.{func_name}] {result_str}", True
+                except TypeError as e:
+                    return f"[{tool_name}.{func_name}] Error: {e!s}", False
+                except Exception as e:
+                    return f"[{tool_name}.{func_name}] Error: {e!s}", False
@@
-            return f"Invalid tool function name format: {str(e)}", False
+            return f"Invalid tool function name format: {e!s}", False
@@
-                        # Unexpected format – record the raw representation
+                        # Unexpected format - record the raw representation
                         arg_repr = repr(args)

Also applies to: 176-185, 382-388


33-43: TypedDict completeness: include turn_metadata in MultiTurnToolMetadata

turn_metadata is used but not declared; add it to the TypedDict.

Apply this diff:

 class MultiTurnToolMetadata(TypedDict):
     """Metadata for tracking multi-turn tool state."""
     id: str
     current_turn: int
     max_turns: int
     ground_truth: List[List[str]]  # GT tool calls per turn
     user_question_bank: List[List[Dict[str, str]]]  # Next user questions
     model_tool_instances: Dict[str, Any]  # Model's tool instances
     gt_tool_instances: Dict[str, Any]  # Ground truth tool instances
     model_calls_per_turn: List[List[str]]  # Model's calls per turn
+    turn_metadata: Dict[int, Dict[str, Any]]  # Per-turn scoring/diagnostics

Also applies to: 329-331


160-170: Method dispatch ambiguity across tools

Dispatch is based solely on method name; if two tools expose the same name, calls may hit the wrong tool. Prefer explicit tool selection (e.g., require tool field in call JSON or "ToolName.method" form) or implement disambiguation.

Would you like a change to accept {"tool": "GorillaFileSystem", "name": "cd", "args": {...}} with strict routing and a backward-compatible fallback?


404-405: Remove debug artifacts

Leftover breakpoints/comments should be dropped before merging.

Apply this diff:

-        # breakpoint()
         return "\n".join(tool_results), model_calls_made, turn_success
@@
-        # breakpoint()
         return state_score, call_score
@@
-        # breakpoint()
         return EnvironmentReturn(

Also applies to: 222-223, 488-489

examples/configs/evals/eval.yaml (1)

55-55: Drop the no-op change

Looks like this hunk only adjusts the trailing newline. Please revert it to keep noise out of the history unless there’s an intentional config tweak tied to the PR.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e60c4d9 and 1f54c20.

⛔ Files ignored due to path filters (4)
  • docs/assets/bfcl_reward_2.png is excluded by !**/*.png
  • docs/assets/bfcl_v3_per_category_rewards.png is excluded by !**/*.png
  • docs/assets/bfcl_val_acc.png is excluded by !**/*.png
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (26)
  • docs/guides/multi_turn_bfcl.md (1 hunks)
  • docs/index.md (1 hunks)
  • examples/configs/evals/bfcl_eval.yaml (1 hunks)
  • examples/configs/evals/eval.yaml (1 hunks)
  • examples/configs/grpo_bfclv3.yaml (1 hunks)
  • examples/run_eval.py (3 hunks)
  • examples/run_grpo_bfclv3.py (1 hunks)
  • nemo_rl/algorithms/grpo.py (2 hunks)
  • nemo_rl/data/__init__.py (1 hunks)
  • nemo_rl/data/datasets/eval_datasets/__init__.py (3 hunks)
  • nemo_rl/data/datasets/eval_datasets/bfclv3.py (1 hunks)
  • nemo_rl/data/datasets/response_datasets/__init__.py (3 hunks)
  • nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py (1 hunks)
  • nemo_rl/data/processors.py (2 hunks)
  • nemo_rl/distributed/ray_actor_environment_registry.py (1 hunks)
  • nemo_rl/environments/multi_turn_tool_environment.py (1 hunks)
  • nemo_rl/environments/tools/gorilla_file_system.py (1 hunks)
  • nemo_rl/environments/tools/math_api.py (1 hunks)
  • nemo_rl/environments/tools/message_api.py (1 hunks)
  • nemo_rl/environments/tools/ticket_api.py (1 hunks)
  • nemo_rl/environments/tools/trading_bot.py (1 hunks)
  • nemo_rl/environments/tools/twitter_api.py (1 hunks)
  • nemo_rl/environments/tools/vehicle_control.py (1 hunks)
  • nemo_rl/evals/eval.py (7 hunks)
  • nemo_rl/experience/rollouts.py (2 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/distributed/ray_actor_environment_registry.py
  • nemo_rl/data/__init__.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/environments/multi_turn_tool_environment.py
  • nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py
  • nemo_rl/environments/tools/gorilla_file_system.py
  • examples/run_eval.py
  • nemo_rl/evals/eval.py
  • nemo_rl/data/datasets/eval_datasets/__init__.py
  • nemo_rl/environments/tools/twitter_api.py
  • nemo_rl/environments/tools/trading_bot.py
  • nemo_rl/data/datasets/response_datasets/__init__.py
  • nemo_rl/data/datasets/eval_datasets/bfclv3.py
  • nemo_rl/environments/tools/ticket_api.py
  • nemo_rl/environments/tools/math_api.py
  • examples/run_grpo_bfclv3.py
  • nemo_rl/environments/tools/vehicle_control.py
  • nemo_rl/environments/tools/message_api.py
  • nemo_rl/data/processors.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/distributed/ray_actor_environment_registry.py
  • nemo_rl/data/__init__.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/environments/multi_turn_tool_environment.py
  • nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py
  • nemo_rl/environments/tools/gorilla_file_system.py
  • nemo_rl/evals/eval.py
  • nemo_rl/data/datasets/eval_datasets/__init__.py
  • nemo_rl/environments/tools/twitter_api.py
  • nemo_rl/environments/tools/trading_bot.py
  • nemo_rl/data/datasets/response_datasets/__init__.py
  • nemo_rl/data/datasets/eval_datasets/bfclv3.py
  • nemo_rl/environments/tools/ticket_api.py
  • nemo_rl/environments/tools/math_api.py
  • nemo_rl/environments/tools/vehicle_control.py
  • nemo_rl/environments/tools/message_api.py
  • nemo_rl/data/processors.py
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section

Files:

  • docs/index.md
  • docs/guides/multi_turn_bfcl.md
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/grpo_bfclv3.yaml
🧠 Learnings (1)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code

Applied to files:

  • nemo_rl/data/__init__.py
🧬 Code graph analysis (18)
nemo_rl/distributed/ray_actor_environment_registry.py (1)
nemo_rl/distributed/virtual_cluster.py (1)
  • PY_EXECUTABLES (42-58)
nemo_rl/experience/rollouts.py (2)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
nemo_rl/distributed/batched_data_dict.py (1)
  • size (793-802)
nemo_rl/environments/multi_turn_tool_environment.py (10)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-839)
nemo_rl/distributed/virtual_cluster.py (1)
  • PY_EXECUTABLES (42-58)
nemo_rl/environments/interfaces.py (2)
  • EnvironmentInterface (52-88)
  • EnvironmentReturn (26-49)
nemo_rl/environments/tools/math_api.py (3)
  • _load_scenario (12-22)
  • add (170-184)
  • mean (49-64)
nemo_rl/environments/tools/message_api.py (1)
  • _load_scenario (69-87)
nemo_rl/environments/tools/trading_bot.py (1)
  • _load_scenario (159-191)
nemo_rl/environments/tools/vehicle_control.py (1)
  • _load_scenario (81-158)
nemo_rl/environments/tools/gorilla_file_system.py (1)
  • _load_scenario (151-199)
nemo_rl/environments/tools/ticket_api.py (1)
  • _load_scenario (35-47)
nemo_rl/environments/tools/twitter_api.py (1)
  • _load_scenario (29-50)
nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py (1)
nemo_rl/data/interfaces.py (1)
  • TaskDataSpec (53-86)
nemo_rl/environments/tools/gorilla_file_system.py (2)
nemo_rl/environments/tools/ticket_api.py (1)
  • _load_scenario (35-47)
nemo_rl/environments/tools/twitter_api.py (1)
  • _load_scenario (29-50)
examples/run_eval.py (3)
nemo_rl/environments/multi_turn_tool_environment.py (1)
  • MultiTurnToolEnvironment (303-513)
nemo_rl/distributed/ray_actor_environment_registry.py (1)
  • get_actor_python_env (49-64)
nemo_rl/environments/math_environment.py (1)
  • MathEnvironment (222-372)
nemo_rl/evals/eval.py (4)
nemo_rl/experience/rollouts.py (2)
  • run_async_multi_turn_rollout (760-904)
  • run_multi_turn_rollout (316-531)
nemo_rl/distributed/batched_data_dict.py (2)
  • repeat_interleave (703-724)
  • BatchedDataDict (75-839)
nemo_rl/data/llm_message_utils.py (1)
  • get_keys_from_message_log (126-138)
nemo_rl/environments/multi_turn_tool_environment.py (1)
  • step (426-496)
nemo_rl/data/datasets/eval_datasets/__init__.py (1)
nemo_rl/data/datasets/eval_datasets/bfclv3.py (1)
  • BFCLDataset (9-65)
nemo_rl/environments/tools/twitter_api.py (2)
nemo_rl/environments/tools/gorilla_file_system.py (1)
  • _load_scenario (151-199)
nemo_rl/environments/tools/ticket_api.py (1)
  • _load_scenario (35-47)
nemo_rl/environments/tools/trading_bot.py (3)
nemo_rl/environments/tools/math_api.py (1)
  • _load_scenario (12-22)
nemo_rl/environments/tools/message_api.py (1)
  • _load_scenario (69-87)
nemo_rl/environments/tools/vehicle_control.py (1)
  • _load_scenario (81-158)
nemo_rl/data/datasets/response_datasets/__init__.py (1)
nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py (1)
  • BFCLMultiturnDataset (39-117)
nemo_rl/data/datasets/eval_datasets/bfclv3.py (2)
nemo_rl/data/interfaces.py (1)
  • TaskDataSpec (53-86)
nemo_rl/data/processors.py (1)
  • bfcl_processor (232-297)
nemo_rl/environments/tools/ticket_api.py (2)
nemo_rl/environments/tools/gorilla_file_system.py (1)
  • _load_scenario (151-199)
nemo_rl/environments/tools/twitter_api.py (1)
  • _load_scenario (29-50)
nemo_rl/environments/tools/math_api.py (3)
nemo_rl/environments/tools/message_api.py (1)
  • _load_scenario (69-87)
nemo_rl/environments/tools/trading_bot.py (1)
  • _load_scenario (159-191)
nemo_rl/environments/tools/vehicle_control.py (1)
  • _load_scenario (81-158)
examples/run_grpo_bfclv3.py (12)
nemo_rl/algorithms/grpo.py (3)
  • MasterConfig (122-130)
  • grpo_train (561-1030)
  • setup (138-460)
nemo_rl/algorithms/utils.py (1)
  • get_tokenizer (157-282)
nemo_rl/data/__init__.py (1)
  • DataConfig (18-42)
nemo_rl/data/datasets/processed_dataset.py (1)
  • AllTaskProcessedDataset (31-126)
nemo_rl/data/datasets/response_datasets/__init__.py (1)
  • load_response_dataset (35-121)
nemo_rl/data/interfaces.py (3)
  • TaskDataProcessFnCallable (89-100)
  • TaskDataSpec (53-86)
  • DatumSpec (32-40)
nemo_rl/distributed/ray_actor_environment_registry.py (1)
  • get_actor_python_env (49-64)
nemo_rl/distributed/virtual_cluster.py (1)
  • init_ray (84-170)
nemo_rl/data/processors.py (1)
  • bfcl_multiturn_hf_data_processor (299-363)
nemo_rl/environments/multi_turn_tool_environment.py (1)
  • MultiTurnToolEnvironment (303-513)
nemo_rl/models/generation/__init__.py (1)
  • configure_generation_config (24-45)
nemo_rl/utils/logger.py (1)
  • get_next_experiment_dir (1222-1256)
nemo_rl/environments/tools/vehicle_control.py (2)
nemo_rl/environments/tools/message_api.py (1)
  • _load_scenario (69-87)
nemo_rl/environments/tools/trading_bot.py (1)
  • _load_scenario (159-191)
nemo_rl/environments/tools/message_api.py (3)
nemo_rl/environments/tools/math_api.py (1)
  • _load_scenario (12-22)
nemo_rl/environments/tools/trading_bot.py (1)
  • _load_scenario (159-191)
nemo_rl/environments/tools/vehicle_control.py (1)
  • _load_scenario (81-158)
nemo_rl/data/processors.py (1)
nemo_rl/data/interfaces.py (2)
  • TaskDataSpec (53-86)
  • DatumSpec (32-40)
🪛 Ruff (0.13.1)
nemo_rl/experience/rollouts.py

909-909: Undefined name AutoTokenizer

(F821)


945-945: Prefer TypeError exception for invalid type

(TRY004)


945-945: Avoid specifying long messages outside the exception class

(TRY003)

nemo_rl/environments/multi_turn_tool_environment.py

53-61: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


81-81: Do not catch blind exception: Exception

(BLE001)


116-116: Consider moving this statement to an else block

(TRY300)


179-179: Consider moving this statement to an else block

(TRY300)


180-180: Do not catch blind exception: Exception

(BLE001)


181-181: Use explicit conversion flag

Replace with conversion flag

(RUF010)


182-182: Consider moving this statement to an else block

(TRY300)


185-185: Use explicit conversion flag

Replace with conversion flag

(RUF010)


189-189: Unused method argument: is_final_turn

(ARG002)


245-245: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)


339-339: Unused method argument: metadata

(ARG002)


382-382: Use explicit conversion flag

Replace with conversion flag

(RUF010)


384-384: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)


449-449: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


449-449: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

nemo_rl/data/datasets/response_datasets/bfcl_multiturn.py

106-106: Do not catch blind exception: Exception

(BLE001)

nemo_rl/environments/tools/gorilla_file_system.py

83-85: Avoid specifying long messages outside the exception class

(TRY003)


97-99: Avoid specifying long messages outside the exception class

(TRY003)


195-195: Prefer next(iter(scenario["root"].keys())) over single element slice

Replace with next(iter(scenario["root"].keys()))

(RUF015)


197-197: Prefer next(iter(scenario["root"].keys())) over single element slice

Replace with next(iter(scenario["root"].keys()))

(RUF015)


588-588: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


617-617: f-string without any placeholders

Remove extraneous f prefix

(F541)


729-729: f-string without any placeholders

Remove extraneous f prefix

(F541)

nemo_rl/evals/eval.py

329-329: Avoid specifying long messages outside the exception class

(TRY003)


365-365: Unpacked variable rollout_metrics is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

nemo_rl/environments/tools/twitter_api.py

29-29: Unused method argument: long_context

(ARG002)


77-77: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


77-77: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

nemo_rl/environments/tools/trading_bot.py

189-191: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


327-327: Use explicit conversion flag

Replace with conversion flag

(RUF010)


469-469: Unused method argument: username

(ARG002)


469-469: Unused method argument: password

(ARG002)

nemo_rl/environments/tools/ticket_api.py

35-35: Unused method argument: long_context

(ARG002)

nemo_rl/environments/tools/math_api.py

45-45: Consider moving this statement to an else block

(TRY300)


46-46: Do not catch blind exception: Exception

(BLE001)


113-113: Consider moving this statement to an else block

(TRY300)


166-166: Consider moving this statement to an else block

(TRY300)


277-277: Consider moving this statement to an else block

(TRY300)

examples/run_grpo_bfclv3.py

191-191: Unpacked variable cluster is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

nemo_rl/environments/tools/vehicle_control.py

88-90: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

nemo_rl/environments/tools/message_api.py

69-69: Unused method argument: long_context

(ARG002)


77-77: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)


214-214: Prefer next(iter(message.items())) over single element slice

Replace with next(iter(message.items()))

(RUF015)


238-238: Prefer next(iter(message.items())) over single element slice

Replace with next(iter(message.items()))

(RUF015)


287-287: Prefer next(iter(message_data.items())) over single element slice

Replace with next(iter(message_data.items()))

(RUF015)


307-307: Local variable sent_count is assigned to but never used

Remove assignment to unused variable sent_count

(F841)


312-312: Unpacked variable message_content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


312-312: Prefer next(iter(message_data.items())) over single element slice

Replace with next(iter(message_data.items()))

(RUF015)

nemo_rl/data/processors.py

234-234: Unused function argument: task_data_spec

(ARG001)


236-236: Unused function argument: max_seq_length

(ARG001)


301-301: Unused function argument: task_data_spec

(ARG001)

🪛 markdownlint-cli2 (0.18.1)
docs/guides/multi_turn_bfcl.md

45-45: Link text should be descriptive

(MD059, descriptive-link-text)


135-135: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


160-160: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (13)
nemo_rl/algorithms/grpo.py (1)

21-21: LGTM: json import aligned with new logging

Import is required for metadata serialization.

examples/run_eval.py (1)

62-68: Actor environment registry entry exists for MultiTurnToolEnvironment
The ACTOR_ENVIRONMENT_REGISTRY includes "nemo_rl.environments.multi_turn_tool_environment.MultiTurnToolEnvironment", so no changes needed.

examples/configs/grpo_bfclv3.yaml (1)

1-1: Defaults merging is supported
load_config_with_inheritance handles a top-level defaults key (string or list), recursively loading and merging parent configs.

docs/guides/multi_turn_bfcl.md (1)

176-176: Confirm docs/index.md was updated to include this guide.

Required when adding docs under docs/**/*.md.

As per coding guidelines

nemo_rl/environments/tools/message_api.py (1)

170-196: Fix send_message to use int message_id (after _generate_id change).

No structural change beyond using the int return.

Please ensure any downstream code expecting {"new_id": ...} is updated; current file uses the int directly.

examples/configs/evals/bfcl_eval.yaml (1)

2-2: Validate 'defaults' format for your config system.

Hydra expects a list for defaults (e.g., defaults: [eval.yaml] or a structured list). If you’re using Hydrated custom parsing this may be fine; otherwise, consider switching to list form.

If using Hydra, consider:

-defaults: "eval.yaml"
+defaults:
+  - eval.yaml

Please confirm the expected format for your config loader.

nemo_rl/distributed/ray_actor_environment_registry.py (1)

45-46: Add registry entry looks correct; verify actor decoration.

Mapping MultiTurnToolEnvironment to PY_EXECUTABLES.SYSTEM is consistent with other envs. Please confirm the class is a Ray actor (decorated/constructed via .options/.remote) so this entry is actually used.

nemo_rl/data/datasets/response_datasets/__init__.py (1)

128-129: Export addition LGTM.

nemo_rl/experience/rollouts.py (1)

493-504: Hook placement for BFCL next-user question is sensible.

Appending user follow-ups only for continuing samples and guarding by metadata is appropriate.

Please confirm user_question_bank entries’ structure matches the assumptions handled in _append_next_user_question (list nesting vs dict).

nemo_rl/evals/eval.py (1)

277-301: Signature change: ensure all call sites pass tokenizer.

run_env_eval now accepts tokenizer; confirm examples/run_eval.py and any external callers are updated, else bfcl_multiturn will raise.

docs/index.md (1)

31-31: Guide indexed in main toctree

Thanks for wiring the new multi-turn BFCL guide into the guides toctree so it shows up in the rendered docs; this keeps the navigation consistent with the docs guidelines.

pyproject.toml (1)

54-55: Dependency aligns with new dataset loader

Adding jsonlines to the core dependency set matches the new BFCL dataset reader’s requirements, so this ensures eval/train runs won’t hit an import error.

nemo_rl/data/datasets/eval_datasets/__init__.py (1)

21-110: BFCL eval dataset hook looks good

Importing and routing bfcl_multiturn to BFCLDataset, plus exporting it via __all__, cleanly integrates the new eval path without disturbing existing datasets.

Comment on lines 248 to 276
if system_content and len(system_content.strip()) > 0:
sys_prompt: dict[str, str | torch.Tensor] = {
"role": "system",
"content": system_content,
}
sys = tokenizer.apply_chat_template(
[cast(dict[str, str], sys_prompt)],
tokenize=False,
add_generation_prompt=False,
add_special_tokens=False,
)
sys_prompt["token_ids"] = tokenizer(sys, return_tensors="pt")["input_ids"][0]
sys_prompt["content"] = sys
message_log.append(sys_prompt)

# User prompt
# Use custom prompt template if provided, otherwise use the content directly
final_user_content = user_content

user_message = {"role": "user", "content": final_user_content}
message = tokenizer.apply_chat_template(
[user_message],
tokenize=False,
add_generation_prompt=True,
add_special_tokens=False,
)
user_message["token_ids"] = tokenizer(message, return_tensors="pt")["input_ids"][0]
user_message["content"] = message
message_log.append(user_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align tokenization with existing processors; support optional prompt templating.

Two issues:

  • Tokenization should pass add_special_tokens=False (consistency with existing processors and to avoid double specials).
  • Comment says “Use custom prompt template” but it’s not applied.

Apply:

-    # User prompt  
-    # Use custom prompt template if provided, otherwise use the content directly
-    final_user_content = user_content
+    # User prompt
+    # Use custom prompt template if provided, otherwise format the content
+    final_user_content = (
+        task_data_spec.prompt.format(user_content)
+        if getattr(task_data_spec, "prompt", None)
+        else user_content
+    )
@@
-    user_message["token_ids"] = tokenizer(message, return_tensors="pt")["input_ids"][0]
+    user_message["token_ids"] = tokenizer(
+        message, return_tensors="pt", add_special_tokens=False
+    )["input_ids"][0]
@@
-        sys_prompt["token_ids"] = tokenizer(sys, return_tensors="pt")["input_ids"][0]
+        sys_prompt["token_ids"] = tokenizer(
+            sys, return_tensors="pt", add_special_tokens=False
+        )["input_ids"][0]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if system_content and len(system_content.strip()) > 0:
sys_prompt: dict[str, str | torch.Tensor] = {
"role": "system",
"content": system_content,
}
sys = tokenizer.apply_chat_template(
[cast(dict[str, str], sys_prompt)],
tokenize=False,
add_generation_prompt=False,
add_special_tokens=False,
)
sys_prompt["token_ids"] = tokenizer(sys, return_tensors="pt")["input_ids"][0]
sys_prompt["content"] = sys
message_log.append(sys_prompt)
# User prompt
# Use custom prompt template if provided, otherwise use the content directly
final_user_content = user_content
user_message = {"role": "user", "content": final_user_content}
message = tokenizer.apply_chat_template(
[user_message],
tokenize=False,
add_generation_prompt=True,
add_special_tokens=False,
)
user_message["token_ids"] = tokenizer(message, return_tensors="pt")["input_ids"][0]
user_message["content"] = message
message_log.append(user_message)
if system_content and len(system_content.strip()) > 0:
sys_prompt: dict[str, str | torch.Tensor] = {
"role": "system",
"content": system_content,
}
sys = tokenizer.apply_chat_template(
[cast(dict[str, str], sys_prompt)],
tokenize=False,
add_generation_prompt=False,
add_special_tokens=False,
)
sys_prompt["token_ids"] = tokenizer(
sys, return_tensors="pt", add_special_tokens=False
)["input_ids"][0]
sys_prompt["content"] = sys
message_log.append(sys_prompt)
# User prompt
# Use custom prompt template if provided, otherwise format the content
final_user_content = (
task_data_spec.prompt.format(user_content)
if getattr(task_data_spec, "prompt", None)
else user_content
)
user_message = {"role": "user", "content": final_user_content}
message = tokenizer.apply_chat_template(
[user_message],
tokenize=False,
add_generation_prompt=True,
add_special_tokens=False,
)
user_message["token_ids"] = tokenizer(
message, return_tensors="pt", add_special_tokens=False
)["input_ids"][0]
user_message["content"] = message
message_log.append(user_message)
🤖 Prompt for AI Agents
In nemo_rl/data/processors.py around lines 248 to 276, the user message
tokenization and templating are inconsistent with the system message handling:
ensure the apply_chat_template call uses a provided custom user template when
present (instead of always using raw user_content) and make all tokenizer(...)
calls pass add_special_tokens=False to avoid injecting special tokens twice;
specifically, if a user prompt template exists, run
tokenizer.apply_chat_template([...], tokenize=False, add_generation_prompt=True,
add_special_tokens=False) to produce the final message string, then compute
token_ids via tokenizer(final_message, return_tensors="pt",
add_special_tokens=False)["input_ids"][0], set user_message["content"] to the
templated message, and append user_message to message_log.

Comment on lines +263 to +294
def adjustClimateControl(
self,
temperature: float,
unit: str = "celsius",
fanSpeed: int = 50,
mode: str = "auto",
) -> Dict[str, Union[str, float]]:
"""
Adjusts the climate control of the vehicle.
Args:
temperature (float): The temperature to set in degree. Default to be celsius.
unit (str): [Optional] The unit of temperature. [Enum]: ["celsius", "fahrenheit"]
fanSpeed (int): [Optional] The fan speed to set from 0 to 100. Default is 50.
mode (str): [Optional] The climate mode to set. [Enum]: ["auto", "cool", "heat", "defrost"]
Returns:
currentTemperature (float): The current temperature set in degree Celsius.
climateMode (str): The current climate mode set.
humidityLevel (float): The humidity level in percentage.
"""
if not (0 <= fanSpeed <= 100):
return {"error": "Fan speed must be between 0 and 100."}
self.acTemperature = temperature
if unit == "fahrenheit":
self.acTemperature = (temperature - 32) * 5 / 9
self.fanSpeed = fanSpeed
self.acMode = mode
return {
"currentACTemperature": temperature,
"climateMode": mode,
"humidityLevel": self.humidityLevel,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

adjustClimateControl(): returned temperature unit is wrong.

Returns input temperature instead of Celsius value; should return the internal Celsius (self.acTemperature).

Apply this diff:

-        return {
-            "currentACTemperature": temperature,
+        return {
+            "currentACTemperature": self.acTemperature,
             "climateMode": mode,
             "humidityLevel": self.humidityLevel,
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def adjustClimateControl(
self,
temperature: float,
unit: str = "celsius",
fanSpeed: int = 50,
mode: str = "auto",
) -> Dict[str, Union[str, float]]:
"""
Adjusts the climate control of the vehicle.
Args:
temperature (float): The temperature to set in degree. Default to be celsius.
unit (str): [Optional] The unit of temperature. [Enum]: ["celsius", "fahrenheit"]
fanSpeed (int): [Optional] The fan speed to set from 0 to 100. Default is 50.
mode (str): [Optional] The climate mode to set. [Enum]: ["auto", "cool", "heat", "defrost"]
Returns:
currentTemperature (float): The current temperature set in degree Celsius.
climateMode (str): The current climate mode set.
humidityLevel (float): The humidity level in percentage.
"""
if not (0 <= fanSpeed <= 100):
return {"error": "Fan speed must be between 0 and 100."}
self.acTemperature = temperature
if unit == "fahrenheit":
self.acTemperature = (temperature - 32) * 5 / 9
self.fanSpeed = fanSpeed
self.acMode = mode
return {
"currentACTemperature": temperature,
"climateMode": mode,
"humidityLevel": self.humidityLevel,
}
def adjustClimateControl(
self,
temperature: float,
unit: str = "celsius",
fanSpeed: int = 50,
mode: str = "auto",
) -> Dict[str, Union[str, float]]:
"""
Adjusts the climate control of the vehicle.
Args:
temperature (float): The temperature to set in degree. Default to be celsius.
unit (str): [Optional] The unit of temperature. [Enum]: ["celsius", "fahrenheit"]
fanSpeed (int): [Optional] The fan speed to set from 0 to 100. Default is 50.
mode (str): [Optional] The climate mode to set. [Enum]: ["auto", "cool", "heat", "defrost"]
Returns:
currentTemperature (float): The current temperature set in degree Celsius.
climateMode (str): The current climate mode set.
humidityLevel (float): The humidity level in percentage.
"""
if not (0 <= fanSpeed <= 100):
return {"error": "Fan speed must be between 0 and 100."}
self.acTemperature = temperature
if unit == "fahrenheit":
self.acTemperature = (temperature - 32) * 5 / 9
self.fanSpeed = fanSpeed
self.acMode = mode
return {
"currentACTemperature": self.acTemperature,
"climateMode": mode,
"humidityLevel": self.humidityLevel,
}
🤖 Prompt for AI Agents
In nemo_rl/environments/tools/vehicle_control.py around lines 263 to 294, the
adjustClimateControl method returns the input temperature rather than the
internal Celsius value; change the returned "currentACTemperature" to use
self.acTemperature (which is converted to Celsius when unit == "fahrenheit") so
the API always reports the vehicle's temperature in Celsius; ensure you return
the numeric Celsius value (self.acTemperature) not the original temperature
parameter.

Comment on lines +306 to +313
def get_outside_temperature_from_weather_com(self) -> Dict[str, float]:
"""
Gets the outside temperature.
Returns:
outsideTemperature (float): The outside temperature in degree Celsius.
"""
return {"error": 404}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_outside_temperature_from_weather_com(): return shape mismatches doc.

Doc says it returns outsideTemperature, but function returns {"error": 404}. Align to the same shape as the Google variant.

Apply this diff:

-        return {"error": 404}
+        return {"outsideTemperature": self._random.uniform(-10.0, 40.0)}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_outside_temperature_from_weather_com(self) -> Dict[str, float]:
"""
Gets the outside temperature.
Returns:
outsideTemperature (float): The outside temperature in degree Celsius.
"""
return {"error": 404}
def get_outside_temperature_from_weather_com(self) -> Dict[str, float]:
"""
Gets the outside temperature.
Returns:
outsideTemperature (float): The outside temperature in degree Celsius.
"""
return {"outsideTemperature": self._random.uniform(-10.0, 40.0)}
🤖 Prompt for AI Agents
In nemo_rl/environments/tools/vehicle_control.py around lines 306 to 313, the
function get_outside_temperature_from_weather_com currently returns {"error":
404} which does not match the docstring or the Google variant; change it to
return a dictionary with the same shape as the Google implementation, i.e.,
{"outsideTemperature": <float>} where the float is the temperature in Celsius,
and on failure return {"outsideTemperature": None} (or raise an exception) so
callers receive a consistent key.

Comment on lines +411 to +422
else:
self.parkingBrakeStatus = "released"
self._parkingBrakeForce = 0.0
self._slopeAngle = 10.0
if self.long_context:
return {
"parkingBrakeInstruction": PARKING_BRAKE_INSTRUCTION,
"parkingBrakeStatus": "released",
"_parkingBrakeForce": 0.0,
"_slopeAngle": 10.0,
}
return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 10.0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

activateParkingBrake(release): slope angle should reset when released.

On release, _slopeAngle remains 10.0; set to 0.0 for consistency.

Apply this diff:

-            self._slopeAngle = 10.0
+            self._slopeAngle = 0.0
@@
-                    "_slopeAngle": 10.0,
+                    "_slopeAngle": 0.0,
@@
-            return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 10.0}
+            return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 0.0}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
self.parkingBrakeStatus = "released"
self._parkingBrakeForce = 0.0
self._slopeAngle = 10.0
if self.long_context:
return {
"parkingBrakeInstruction": PARKING_BRAKE_INSTRUCTION,
"parkingBrakeStatus": "released",
"_parkingBrakeForce": 0.0,
"_slopeAngle": 10.0,
}
return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 10.0}
else:
self.parkingBrakeStatus = "released"
self._parkingBrakeForce = 0.0
self._slopeAngle = 0.0
if self.long_context:
return {
"parkingBrakeInstruction": PARKING_BRAKE_INSTRUCTION,
"parkingBrakeStatus": "released",
"_parkingBrakeForce": 0.0,
"_slopeAngle": 0.0,
}
return {"parkingBrakeStatus": "released", "_parkingBrakeForce": 0.0, "_slopeAngle": 0.0}
🤖 Prompt for AI Agents
In nemo_rl/environments/tools/vehicle_control.py around lines 411 to 422, when
releasing the parking brake the code sets self._slopeAngle = 10.0 and returns
"_slopeAngle": 10.0; change both to 0.0 so slope angle is reset on release.
Update the assignment to self._slopeAngle = 0.0 and adjust the returned
dictionaries in both the long_context and non-long_context branches to include
"_slopeAngle": 0.0.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

slikhite-1 and others added 4 commits September 25, 2025 13:08
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
Signed-off-by: slikhite-1 <[email protected]>
@RayenTian
Copy link
Contributor

Thank you for your hard work on this. I really appreciate the effort you put into it. Would you please put some more test result in PR docs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document might be better placed as a chapter within docs/guides/environments.md.

grpo:
num_prompts_per_step: 16
num_generations_per_prompt: 8
max_rollout_turns: 4 # for multi-turn rollouts. Math Environments just have 1 turn (answering the question)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line's comment can be removed.

@@ -0,0 +1,64 @@
defaults: "grpo_math_1B.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this files already set default yaml, we can only leave neccessary configs in this file. Some configurations that duplicate those in the default YAML can be removed, such as max_num_steps and so on.

}
).remote(env_configs["math"])

bfcl_cfg = env_configs.get("bfcl_multiturn", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better not use default value like this. Refer to #863

).remote(env_configs["math"])

bfcl_cfg = env_configs.get("bfcl_multiturn", {})
if bool(bfcl_cfg.get("enable", False)):
Copy link
Contributor

@RayenTian RayenTian Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Same with the default value. Refer to [Chore] Clean up config defaults throughout codebase #863
  2. Would you please also add the situation for other env?

if "generation" in master_config and "vllm_cfg" in master_config["generation"]:
max_seq_len = master_config["generation"]["vllm_cfg"].get("max_model_len", 8192)
else:
max_seq_len = 8192 # fallback
Copy link
Contributor

@RayenTian RayenTian Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid setting default value in code. Here can add an assert to inform user to set max_model_len

max_seq_len=max_seq_len,
max_rollout_turns=max_rollout_turns,
greedy=False,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if it’s appropriate to call the rollout interface directly here. Should we consider encapsulating a multiturn interface inside vllm_generation.py? What do you think @yuki-97?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think call the rollout interface directly is ok since it's multi-turn, and I think this implementation is ok for now since we didn't have a multi-turn rollout support for text-in-text-out format (vllm_generation.generate_text).
I suggest we can use this implementation for multi-turn in eval first, and discuss how to or if we need to keep the text-in-text-out format later.

)

# accuracy = sum(rewards) / len(rewards)
# print("accuracy", accuracy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines can be removed.

"""Initialize tool instances."""
tools = {}
for tool_name in tool_names:
if tool_name in self.TOOL_CLASS_MAPPING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be an assert if tool not in pre-defined list to tell users.

Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slikhite-1 thanks for supporting this! left some comments.

I also found this PR failed at copyright and lint.

  • for copyright: can you add copyright in nemo_rl/data/datasets/eval_datasets/bfclv3.py and new files under nemo_rl/environments/tools?
  • for lint: can you rebase main branch and run pre-commit run --all-files locally? and there seem to be a lot of missing new line at the end of the files, can you add them?

max_model_len: ${policy.max_total_sequence_length}
data:
dataset_name: "bfcl_multiturn"
filter_long_samples: true
Copy link
Contributor

@yuki-97 yuki-97 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm missing something. is this param filter_long_samples used anywhere?

"mlflow",
"nvidia-nvshmem-cu12", # for deep_ep build
"swanlab",
"jsonlines",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since pyproject.toml is changed, uv.lock should be changed. could you also push the update of uv.lock?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants