Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 42 additions & 24 deletions closed/NVIDIA/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# MLPerf Inference v5.0 NVIDIA-Optimized Implementations
This is a repository of NVIDIA-optimized implementations for the [MLPerf](https://mlcommons.org/en/) Inference Benchmark.
This README is a quickstart tutorial on how to use our code as a public / external user.
This README is a quickstart tutorial on how to use our code as a public / external user.
TLDR: For a quick reproduction steps of the benchmark, please skip to [Quick repro steps](#quick-repro-steps).

---



### MLPerf Inference Policies and Terminology

This is a new-user guide to learn how to use NVIDIA's MLPerf Inference submission repo. **To get started with MLPerf Inference, first familiarize yourself with the [MLPerf Inference Policies, Rules, and Terminology](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc)**. This is a document from the MLCommons committee that runs the MLPerf benchmarks, and the rest of all MLPerf Inference guides will assume that you have read and familiarized yourself with its contents. The most important sections of the document to know are:
Expand All @@ -14,27 +17,6 @@ This is a new-user guide to learn how to use NVIDIA's MLPerf Inference submissio
- [LoadGen Operation](https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#51-loadgen-operation)



### Quick Start on computelab

Rrequest gpu nodes on computelab

- [Machines on computelab](https://confluence.nvidia.com/display/GCA/MLPerf-Inference+v5.0+Machines)

`export MLPERF_SCRATCH_PATH=/path/to/scratch/space`: set mlperf scratch space

`make prebuild`: builds and launch the container.

`make build`: builds plugins and binaries.

`make generate_engines RUN_ARGS="--benchmarks=<BENCHMARK> --scenarios=<SCENARIO>`: generates engines.

`make run_harness RUN_ARGS="--benchmarks=<BENCHMARK> --scenarios=<SCENARIO>`: runs the harness to get perf results.

`make run_harness RUN_ARGS="--benchmarks=<BENCHMARK> --scenarios=<SCENARIO> --test_mode=AccuracyOnly`: runs the harness to get accuracy results.

Add --config_ver=high_accuracy to run with high accuracy target.

### NVIDIA's Submission

NVIDIA submits with multiple systems, each of which are in either the datacenter category, edge category, or both. In general, multi-GPU systems are submitted in datacenter, and single-GPU systems are submitted in edge.
Expand Down Expand Up @@ -62,8 +44,6 @@ Make sure that your user is in docker group already. If you get permission issue

### Software Dependencies

### Datacenter systems

Our submission uses Docker to set up the environment. Requirements are:

- [Docker CE](https://docs.docker.com/engine/install/)
Expand Down Expand Up @@ -579,3 +559,41 @@ More specific documentation and for debugging:
- documentation/submission_guide.md - Documentation on officially submitting our repo to MLPerf Inference
- documentation/calibration.md - Documentation on how we use calibration and quantization for MLPerf Inference


### Quick repro steps
1. From `repo_root/closed/NVIDIA`, run
```bash
make prebuild
```

2. To build thrid party software dependancies:
```bash
make build
```
Optionally, for triton harnesses:
```bash
make clone_triton && make build_triton
```

3. To build inference engines (taking `llama2-70b` as an example):
```bash
make generate_engines RUN_ARGS="--benchmarks=llama2-70b --scenarios=Offline,Server"
```

4. To run the benchmark:
```bash
make run_harness RUN_ARGS="--benchmarks=llama2-70b --scenarios=Offline,Server --test_mode=PerformanceOnly" # Performance run
make run_harness RUN_ARGS="--benchmarks=llama2-70b --scenarios=Offline,Server --test_mode=AccuracyOnly" # Accuracy run
```

5. To run compliance tests:
```bash
make run_audit_harness RUN_ARGS="--benchmarks=llama2-70b --scenarios=Offline,Server"
```

#### More info:
- [documentation/performance_tuning_guide.md](documentation/performance_tuning_guide.md) - Documentation related to tuning and benchmarks via configuration changes
- [documentation/commands.md](documentation/commands.md) - Documentation on commonly used Make targets and RUN_ARGS options
- [documentation/FAQ.md](documentation/FAQ.md) - An FAQ on common errors or issues that have popped up in the past
- [documentation/submission_guide.md](documentation/submission_guide.md) - Documentation on officially submitting our repo to MLPerf Inference
- [documentation/calibration.md](documentation/calibration.md) - Documentation on how we use calibration and quantization for MLPerf Inference