Skip to content

EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.

License

Notifications You must be signed in to change notification settings

IBM/eval-assist

PyPI - Version GitHub License OpenSSF Best Practices

EvalAssist

EvalAssist Logo

Project WebsiteDocumentationVideo demo

EvalAssist is an LLM-as-a-Judge framework built on top of the Unitxt open source evaluation library for large language models. The EvalAssist application provides users with a convenient way of iteratively testing and refining LLM-as-a-judge criteria, and supports both direct (rubric-based) and pairwise assessment paradigms (relation-based), the two most prevalent forms of LLM-as-a-judge evaluations available. EvalAssist is designed to be model-agnostic, i.e. the content to be evaluated can come from any model. EvalAssist supports a rich set of off-the-shelf judge models that can easily be extended. An API key is required to use the pre-defined judge models. Once users are satisfied with their criteria, they can auto-generate a Notebook with Unitxt code to run bulk evaluations with larger data sets based on their criteria definition. EvalAssist also includes a catalog of example test cases, exhibiting the use of LLM-as-a-judge across a variety of scenarios. Users can save their own test cases.

How to Install

EvalAssist can be installed using various package managers. Before proceeding, ensure you're using Python >= 3.10, <3.14 to avoid compatibility issues. Make sure to set DATA_DIR to avoid data loss (e.g. export DATA_DIR="~/.eval_assist").

Installation via pip

python3 -m venv venv
source venv/bin/activate # or venv\Scripts\activate.bat in Windows
pip install 'evalassist[webapp]'
eval-assist serve

Installation via uv

uvx --python 3.11 --from 'evalassist[webapp]' eval-assist serve

Installation via conda

conda create -n evalassist python=3.11
conda activate evalassist
pip install 'evalassist[webapp]'
eval-assist serve

In all cases, after running the command, you can access the EvalAssist server at http://localhost:8000.

EvalAssist can be configured through environment variables and command parameters. Take a look at the configuration documentation.

Check out the tutorials to see how to run evaluations and generate synthetic data.

Contributing

You can contribute to EvalAssist or to Unitxt. Look at the Contribution Guidelines for more details.

Look at the Local Development Guide for instructions on setting up a local development environment.

Documentation

You can find extesive documentation of the system in the Documentation page.

About

EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 9