You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The required `CONTAINER` can be built by following the instructions in the [Docker documentation](docker.md).
185
+
186
+
## Bisecting Failing Tests
187
+
188
+
### Bisecting Unit/Functional Tests
189
+
190
+
Use `tools/bisect-run.sh` to automatically run your test command across a commit range and find the first bad commit. It forces venv rebuilds so dependencies match each commit.
191
+
192
+
Basic usage:
193
+
194
+
```sh
195
+
GOOD=<good_ref> BAD=<bad_ref> \
196
+
tools/bisect-run.sh uv run --group test pytest tests/unit/test_foobar.py::test_case
197
+
```
198
+
199
+
Examples:
200
+
201
+
```sh
202
+
GOOD=56a6225 BAD=32faafa \
203
+
tools/bisect-run.sh uv run --group dev pre-commit run --all-files
204
+
205
+
GOOD=464ed38 BAD=c843f1b \
206
+
tools/bisect-run.sh uv run --group test pytest tests/unit/test_foobar.py
207
+
```
208
+
209
+
Notes:
210
+
211
+
- Exit codes drive the classification: 0=good, non-zero=bad, 125=skip.
212
+
- The script pre-verifies that `GOOD` is actually good by running your command on it.
213
+
- On failure or interruption, it saves a timestamped `git bisect log` to `<repo>/bisect-logs/`. You can resume later with `BISECT_REPLAY_LOG` (see below).
214
+
- Set `BISECT_NO_RESET=1` to keep the bisect state after the script exits.
The command `GOOD=$(git log --format="%h" --diff-filter=A -- "$TEST_CASE")` selects the commit that introduced the test script. Because the path is typically added only once, this yields the introduction commit to use as the known good baseline.
248
+
::::
249
+
250
+
-`tools.bisect/launch-bisect-helper.sh` ensures each commit runs in a fresh venv, creates an isolated code snapshot per commit, blocks until metrics are checked, and returns a suitable exit code for bisect.
251
+
252
+
Progressively more advanced cases:
253
+
254
+
1) Adjusting the test case on the fly with `SED_CLAUSES`
255
+
256
+
- If a test script needs small textual edits during bisect (e.g., relax a threshold; drop a noisy metric you don’t care to bisect over when focusing on convergence vs. perf), provide a sed script via `SED_CLAUSES`. You can also use this to adjust runtime controls like `MAX_STEPS`, `STEPS_PER_RUN`, or `NUM_MINUTES` when a perf regression slows runs down so they still complete and emit metrics. The helper applies it and automatically restores the test script after the run.
- If the nightly script supports Hydra/CLI overrides, pass them via `EXTRA_SCRIPT_ARGS` so each run adopts those overrides (e.g., fix a transient incompatibility):
272
+
273
+
:::{important}
274
+
Changing script arguments can materially affect performance characteristics and/or convergence behavior. This may influence the validity of the bisect outcome relative to your baseline configuration. Prefer the smallest, clearly-justified overrides, keep them consistent across all commits, and document them alongside your results so conclusions are interpreted correctly.
1) Resuming from an earlier interrupted or misclassified session
285
+
286
+
- Use `BISECT_REPLAY_LOG` with the bisect driver to replay prior markings and continue running. This is handy if a run failed for an unrelated reason or you manually edited a log to change `bad` → `skip` or to drop an incorrect line.
287
+
193
288
```sh
194
-
uv run --group test mypy examples/run_grpo_math.py
- Exit code 125 means “skip this commit” in git bisect; our helper returns 125 if required env is missing or if it needs to abort safely.
297
+
- Submodules must be clean. The bisect script enforces `submodule.recurse=true` and `fetch.recurseSubmodules=on-demand` so submodules follow commit checkouts.
298
+
- Each commit uses a fresh code snapshot directory and a separate Megatron checkpoint dir to avoid cross-commit contamination.
299
+
- On failure/interrupt, a timestamped bisect log is saved under `<repo>/bisect-logs/`. Use it with `BISECT_REPLAY_LOG` to resume.
0 commit comments