Skip to content

Leaderboard v0.6.0

Latest
Compare
Choose a tag to compare
@RobotSail RobotSail released this 16 Apr 05:42
· 80 commits to main since this release
cea8acd

Leaderboard v0.6.0

This release of the InstructLab/eval library provides support for the Leaderboardv2 benchmark.

To use the new leaderboard evaluator, install it with pip install instructlab-eval[leaderboard] and then import LeaderboardV2Evaluator from instructlab.eval.leaderboard:

from instructlab.eval.leaderboard import LeaderboardV2Evaluator

evaluator = LeaderboardV2Evaluator(model_path="meta-llama/Llama-3.1-8B-Instruct", num_gpus=8)
result = evaluator.run()
print(f"Results for meta-llama/Llama-3.1-8B-Instruct: {result['overall_score']}")

This new evaluator supports running in one of two ways:

  • Running locally: this will evaluate in an optimized fashion by splitting tasks between vLLM and HF Transformers
  • Running remotely: You can provide an OpenAI client and this will evaluator will simply make calls there.

What's Changed

Here's a comprehensive outline of all the changes made:

  • ci: Add OpenAI keys into CI by @alimaredia in #221
  • build(deps): bump sarisia/actions-status-discord from 1.15.1 to 1.15.3 by @dependabot in #220
  • build(deps): bump hynek/build-and-inspect-python-package from 2.11.0 to 2.12.0 by @dependabot in #217
  • build(deps): bump rhysd/actionlint from 1.7.4 to 1.7.7 in /.github/workflows by @dependabot in #216
  • build(deps): bump step-security/harden-runner from 2.10.3 to 2.10.4 by @dependabot in #215
  • build(deps): bump DavidAnson/markdownlint-cli2-action from 18.0.0 to 19.1.0 by @dependabot in #213
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.45.0 to 0.46.0 by @dependabot in #207
  • ci: Don't require secrets in medium e2e test by @danmcp in #226
  • build(deps): bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in #225
  • build(deps): bump machulav/ec2-github-runner from 2.3.7 to 2.3.8 by @dependabot in #224
  • build(deps): bump aws-actions/configure-aws-credentials from 4.0.2 to 4.0.3 by @dependabot in #223
  • build(deps): bump pypa/gh-action-pypi-publish from 1.12.3 to 1.12.4 by @dependabot in #222
  • build(deps): bump aws-actions/configure-aws-credentials from 4.0.3 to 4.1.0 by @dependabot in #228
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.46.0 to 0.47.0 by @dependabot in #229
  • build(deps): bump step-security/harden-runner from 2.10.4 to 2.11.0 by @dependabot in #230
  • build(deps): bump actions/cache from 4.2.0 to 4.2.1 by @dependabot in #231
  • build(deps): bump actions/cache from 4.2.1 to 4.2.2 by @dependabot in #233
  • build(deps): bump actions/download-artifact from 4.1.8 to 4.1.9 by @dependabot in #232
  • build(deps): bump actions/setup-python from 5.4.0 to 5.5.0 by @dependabot in #239
  • build(deps): bump rojopolis/spellcheck-github-actions from 0.47.0 to 0.48.0 by @dependabot in #240
  • build(deps): bump step-security/harden-runner from 2.11.0 to 2.11.1 by @dependabot in #241
  • build(deps): bump actions/download-artifact from 4.1.9 to 4.2.1 by @dependabot in #237
  • build(deps): bump actions/cache from 4.2.2 to 4.2.3 by @dependabot in #236
  • Implement leaderboard as a benchmark by @RobotSail in #234

Full Changelog: v0.5.1...v0.6.0