Skip to content

Commit 89a0b31

Browse files
committed
add stuff
Signed-off-by: Terry Kong <[email protected]>
1 parent b7abf53 commit 89a0b31

File tree

3 files changed

+115
-14
lines changed

3 files changed

+115
-14
lines changed

docs/testing.md

Lines changed: 108 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -181,18 +181,119 @@ in Docker with this script:
181181
CONTAINER=... bash run_functional_in_docker.sh functional/sft.sh
182182
```
183183

184+
The required `CONTAINER` can be built by following the instructions in the [Docker documentation](docker.md).
185+
186+
## Bisecting Failing Tests
187+
188+
### Bisecting Unit/Functional Tests
189+
190+
Use `tools/bisect-run.sh` to automatically run your test command across a commit range and find the first bad commit. It forces venv rebuilds so dependencies match each commit.
191+
192+
Basic usage:
193+
194+
```sh
195+
GOOD=<good_ref> BAD=<bad_ref> \
196+
tools/bisect-run.sh uv run --group test pytest tests/unit/test_foobar.py::test_case
197+
```
198+
199+
Examples:
200+
201+
```sh
202+
GOOD=56a6225 BAD=32faafa \
203+
tools/bisect-run.sh uv run --group dev pre-commit run --all-files
204+
205+
GOOD=464ed38 BAD=c843f1b \
206+
tools/bisect-run.sh uv run --group test pytest tests/unit/test_foobar.py
207+
```
208+
209+
Notes:
210+
211+
- Exit codes drive the classification: 0=good, non-zero=bad, 125=skip.
212+
- The script pre-verifies that `GOOD` is actually good by running your command on it.
213+
- On failure or interruption, it saves a timestamped `git bisect log` to `<repo>/bisect-logs/`. You can resume later with `BISECT_REPLAY_LOG` (see below).
214+
- Set `BISECT_NO_RESET=1` to keep the bisect state after the script exits.
215+
216+
Resume from a saved bisect log:
217+
218+
```sh
219+
BISECT_REPLAY_LOG=/abs/path/to/bisect-2025....log \
220+
tools/bisect-run.sh uv run --group test pytest tests/unit/test_foobar.py
221+
```
222+
223+
### Bisecting nightlies
184224

185-
## Static Type Checking with [MyPy](https://mypy-lang.org/)
186-
Static type checking can be run with no GPU resources:
225+
Nightly training scripts can be bisected using the same driver plus a helper that sets up hermetic runs on Slurm.
226+
227+
Vanilla flow:
228+
229+
```sh
230+
# Copy bisect utilities outside of VCS to ensure a stable runner
231+
rsync -ahP --delete tools/ tools.bisect/
232+
233+
TEST_CASE=tests/test_suites/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.sh
234+
235+
HF_HOME=... \
236+
HF_DATASETS_CACHE=... \
237+
CONTAINER=... \
238+
MOUNTS=... \
239+
ACCOUNT=... \
240+
PARTITION=... \
241+
GOOD=$(git log --format="%h" --diff-filter=A -- "$TEST_CASE") \
242+
BAD=HEAD \
243+
tools.bisect/bisect-run.sh tools.bisect/launch-bisect.sh "$TEST_CASE"
244+
```
245+
246+
::::{note}
247+
The command `GOOD=$(git log --format="%h" --diff-filter=A -- "$TEST_CASE")` selects the commit that introduced the test script. Because the path is typically added only once, this yields the introduction commit to use as the known good baseline.
248+
::::
249+
250+
- `tools.bisect/launch-bisect-helper.sh` ensures each commit runs in a fresh venv, creates an isolated code snapshot per commit, blocks until metrics are checked, and returns a suitable exit code for bisect.
251+
252+
Progressively more advanced cases:
253+
254+
1) Adjusting the test case on the fly with `SED_CLAUSES`
255+
256+
- If a test script needs small textual edits during bisect (e.g., relax a threshold; drop a noisy metric you don’t care to bisect over when focusing on convergence vs. perf), provide a sed script via `SED_CLAUSES`. You can also use this to adjust runtime controls like `MAX_STEPS`, `STEPS_PER_RUN`, or `NUM_MINUTES` when a perf regression slows runs down so they still complete and emit metrics. The helper applies it and automatically restores the test script after the run.
257+
258+
```sh
259+
SED_CLAUSES=$(cat <<'SED'
260+
s#mean(data\["timing/train/total_step_time"\], -6, -1) < 0\.6#mean(data["timing/train/total_step_time"], -6, -1) < 0.63#
261+
/ray\/node\.0\.gpu\.0\.mem_gb/d
262+
SED
263+
) \
264+
GOOD=$(git log --format="%h" --diff-filter=A -- "$TEST_CASE") \
265+
BAD=HEAD \
266+
tools.bisect/bisect-run.sh tools.bisect/launch-bisect.sh "$TEST_CASE"
267+
```
268+
269+
1) Passing extra script arguments
270+
271+
- If the nightly script supports Hydra/CLI overrides, pass them via `EXTRA_SCRIPT_ARGS` so each run adopts those overrides (e.g., fix a transient incompatibility):
272+
273+
:::{important}
274+
Changing script arguments can materially affect performance characteristics and/or convergence behavior. This may influence the validity of the bisect outcome relative to your baseline configuration. Prefer the smallest, clearly-justified overrides, keep them consistent across all commits, and document them alongside your results so conclusions are interpreted correctly.
275+
:::
187276

188277
```sh
189-
uv run --group test mypy {program}.py
278+
EXTRA_SCRIPT_ARGS="++data.num_workers=1" \
279+
GOOD=$(git log --format="%h" --diff-filter=A -- "$TEST_CASE") \
280+
BAD=HEAD \
281+
tools.bisect/bisect-run.sh tools.bisect/launch-bisect.sh "$TEST_CASE"
190282
```
191283

192-
For example,
284+
1) Resuming from an earlier interrupted or misclassified session
285+
286+
- Use `BISECT_REPLAY_LOG` with the bisect driver to replay prior markings and continue running. This is handy if a run failed for an unrelated reason or you manually edited a log to change `bad``skip` or to drop an incorrect line.
287+
193288
```sh
194-
uv run --group test mypy examples/run_grpo_math.py
195-
uv run --group test mypy examples/run_sft.py
289+
BISECT_REPLAY_LOG=/abs/path/to/bisect-logs/bisect-YYYYmmdd-HHMMSS-<sha>.log \
290+
HF_HOME=... HF_DATASETS_CACHE=... CONTAINER=... MOUNTS=... ACCOUNT=... PARTITION=... \
291+
tools.bisect/bisect-run.sh tools.bisect/launch-bisect.sh "$TEST_CASE"
196292
```
197293

198-
mypy.ini controls the configuration of mypy.
294+
Tips and conventions:
295+
296+
- Exit code 125 means “skip this commit” in git bisect; our helper returns 125 if required env is missing or if it needs to abort safely.
297+
- Submodules must be clean. The bisect script enforces `submodule.recurse=true` and `fetch.recurseSubmodules=on-demand` so submodules follow commit checkouts.
298+
- Each commit uses a fresh code snapshot directory and a separate Megatron checkpoint dir to avoid cross-commit contamination.
299+
- On failure/interrupt, a timestamped bisect log is saved under `<repo>/bisect-logs/`. Use it with `BISECT_REPLAY_LOG` to resume.

tools/bisect-script.sh renamed to tools/bisect-run.sh

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ set -euo pipefail
2020
export NRL_FORCE_REBUILD_VENVS=true
2121
print_usage() {
2222
cat <<'EOF'
23-
Usage: GOOD=<good_ref> BAD=<bad_ref> tools/bisect-script.sh [command ...]
23+
Usage: GOOD=<good_ref> BAD=<bad_ref> tools/bisect-run.sh [command ...]
2424
2525
Runs a git bisect session between GOOD and BAD to find the first bad commit.
2626
Sets NRL_FORCE_REBUILD_VENVS=true to ensure test environments are rebuilt to match commit's uv.lock.
@@ -30,8 +30,8 @@ commit to verify it actually passes. If it does not, the script aborts early so
3030
you can pick a truly good baseline.
3131
3232
Examples:
33-
GOOD=56a6225 BAD=32faafa tools/bisect-script.sh uv run --group dev pre-commit run --all-files
34-
GOOD=464ed38 BAD=c843f1b tools/bisect-script.sh uv run --group test pytest tests/unit/test_foobar.py
33+
GOOD=56a6225 BAD=32faafa tools/bisect-run.sh uv run --group dev pre-commit run --all-files
34+
GOOD=464ed38 BAD=c843f1b tools/bisect-run.sh uv run --group test pytest tests/unit/test_foobar.py
3535
3636
# Example ouptut:
3737
# 1. Will run until hits the first bad commit.
@@ -85,7 +85,7 @@ SED
8585
) \
8686
GOOD=$(git log --format="%h" --diff-filter=A -- $TEST_CASE) \
8787
BAD=5b9ab15799c35428c557ab6f8687ec461b69383e \
88-
tools.bisect/bisect-script.sh tools.bisect/launch-bisect-helper.sh $TEST_CASE
88+
tools.bisect/bisect-run.sh tools.bisect/launch-bisect.sh $TEST_CASE
8989
9090
Requirements (ensure submodules update when switching commits):
9191
Per-repo (recommended inside this repo):
@@ -111,7 +111,7 @@ Additional features:
111111
to '<repo_root>/bisect-logs/'. Override with BISECT_SAVE_DIR.
112112
- Resume from a prior bisect log via replay:
113113
BISECT_REPLAY_LOG=/path/to/bisect-YYYYmmdd-HHMMSS-<sha>.log \
114-
tools.bisect/bisect-script.sh [command ...]
114+
tools.bisect/bisect-run.sh [command ...]
115115
This will 'git bisect replay' the provided log, then continue with 'git bisect run'.
116116
- Set BISECT_NO_RESET=1 to keep the bisect state after the script exits.
117117
By default, the script resets the bisect on exit.
@@ -214,7 +214,7 @@ on_interrupt_or_error() {
214214
local saved
215215
saved=$(save_bisect_log "interrupt") || true
216216
if [[ -n "$saved" ]]; then
217-
iecho "[bisect] To resume later: BISECT_REPLAY_LOG=$saved <other_env_vars>... tools.bisect/bisect-script.sh ${USER_CMD[@]}"
217+
iecho "[bisect] To resume later: BISECT_REPLAY_LOG=$saved <other_env_vars>... tools.bisect/bisect-run.sh ${USER_CMD[@]}"
218218
fi
219219
iecho "[bisect] Restoring original state with 'git bisect reset' on exit."
220220
fi
@@ -362,7 +362,7 @@ fi
362362
if [[ $RUN_STATUS -ne 0 ]]; then
363363
saved_after_run=$(save_bisect_log "run-exit-${RUN_STATUS}") || true
364364
if [[ -n "$saved_after_run" ]]; then
365-
iecho "[bisect] To resume later: BISECT_REPLAY_LOG=$saved_after_run tools.bisect/bisect-script.sh ${USER_CMD[@]}"
365+
iecho "[bisect] To resume later: BISECT_REPLAY_LOG=$saved_after_run tools.bisect/bisect-run.sh ${USER_CMD[@]}"
366366
fi
367367
fi
368368

File renamed without changes.

0 commit comments

Comments
 (0)