Skip to content

Releases: mlcommons/algorithmic-efficiency

v0.6.0

08 Sep 13:26
2c90454
Compare
Choose a tag to compare

Summary

  • A rolling leaderboard to support continuous submissions.
  • Rule updates aimed at cost efficiency, e.g., removing held-out workloads, limiting to 3 repetition studies, and adjusting workload runtime budgets based on competition results.
  • JIT-sharding for JAX workloads.
  • Important bug fixes (e.g., batch norm behavior) and a more flexible API (e.g., a new prepare_for_eval function).

What's Changed

Improved and streamlined version of the benchmark, which includes important bug fixes, API improvements, and benchmark protocol changes following the lessons learned from the first competition.

Added

  • [Code, Rules] Updated API to allow for prepare_for_eval function (PR/Issue).
  • [Docs] Document default dropout values for each workload (PR/Issue).
  • [Docs] Unified versioning policy section (PR).
  • [Code] Add the ability to change dropout values during training (PR/Issue).

Changed/Removed

  • [Code, Docs] Rename package to algoperf (PR).
  • [Code, Docs] Switch to ruff for linting and formatting(PR).
  • [Code, Rules] Pass train_state to update_params function (PR/Issue).
  • [Code, Rules] Reduced number of studies from 5 to 3 (PR). See also Section 5.1 in our results paper.
  • [Code, Rules] Remove held-out workloads from the benchmark (PR). See also Section 5.1 in our results paper.
  • [Code] Remove sacrebleu dependency (PR).
  • [Code] Switch to pyproject.toml for package management (PR).
  • [Code] Update Python version to 3.11 and dependencies accordingly (PR/Issue).
  • [Rules] Modify the runtime budgets and step hints for each workload (PR/Issue). See also Section 5.1 in our results paper.
  • [Code] Automatically determine the package version via the latest GitHub tag (PR).
  • [Code, Docs] Move all algorithms into a dedicated algorithms directory (PR).
  • [Code] Migrate from pmap to jit in JAX for better performance and scalability (PR).

Fixed

  • [Code] Batch norm bug (PR/PR/Issue).
  • [Code] Fix bug of potentially giving a free evaluation to a submission that goes out of max_runtime (PR/Issue).
  • [Code] Fix that models in the self-tuning ruleset will always be initialized with default dropout (PR/PR/Issue).

Full Changelog: algoperf-benchmark-0.5.0...algoperf-benchmark-0.6.0

v0.5.0

24 Jun 12:39
5b4914f
Compare
Choose a tag to compare

Summary

  • Finalized variant workload targets.
  • Fix in random_utils helper function.
  • For conformer PyTorch Dropout layers set inplace=True.
  • Clear CUDA cache at begining of each trial for PyTorch.

What's Changed

Full Changelog: algoperf-benchmark-0.1.4...algoperf-benchmark-0.1.5

v0.0.4

24 Jun 12:39
8bd3876
Compare
Choose a tag to compare
v0.0.4 Pre-release
Pre-release

Upgrade CUDA version to CUDA 12.1:

  • Upgrade CUDA version in Dockerfiles that will be used for scoring.
  • Update Jax and PyTorch package version tags to use local CUDA installation.

Add flag for completely disabling checkpointing.

  • Note that we will run with checkpointing off at scoring time.

Update Deepspeech and Conformer variant target setting configurations.

  • Note that variant targets are not final.

Fixed bug in scoring code to take best trial in a study for external-tuning ruleset.

Added instructions for submission.

Changed default number of workers for PyTorch data loaders to 0. Running imagenet workloads with >0 may lead to incorrect eval results see #732.
Update: for speech workloads the pytorch_eval_num_workers flag to submission_runner.py has to be set to >0, to prevent data loader crash in jax code.

v0.0.3

24 Jun 12:38
0618974
Compare
Choose a tag to compare
v0.0.3 Pre-release
Pre-release

Update technical documentation.

Bug fixes:

  • Fix workload variant names in Dockerfile.
  • Fix VIT GLU OOM by reducing batch size.
  • Fix submission_runner stopping condition.
  • Fix dropout rng in ViT and WMT.

v0.0.2

24 Jun 12:37
6b188ba
Compare
Choose a tag to compare
v0.0.2 Pre-release
Pre-release

Add workload variants.

Add prize qualification logs for external tuning ruleset.
Note: FastMRI trials with dropout are not yet added due to #664.

Add functionality to Docker startup script for self_tuning ruleset.
Add self_tuning ruleset option to script that runs all workloads for scoring.

Data setup fixes.

Fix tests that check training differences in PyTorch and JAX on GPU.

v0.0.1

24 Jun 12:37
ca87833
Compare
Choose a tag to compare
v0.0.1 Pre-release
Pre-release

First release of the AlgoPerf: Training algorithms benchmarking code.