-
Notifications
You must be signed in to change notification settings - Fork 573
Added GitHub Actions to Test LLM Accuracy Scripts #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Added GitHub Actions to Test LLM Accuracy Scripts #2206
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Thank you Sridhar for your contribution. Can you please sign the MLCommons CLA? @anandhu-eng is it possible to replace the dummy accuracy text content in this PR with real values from the actual accuracy log? |
Hi @arjunsuresh , since results of LLM's models are absent in our unofficial test submission repository, I think we have two options:
Also, for using original accuracy log, original datasets would have to be downloaded(I think the combined size of the three datasets would not exceed the storage provided by the GitHub runner). How about using MLCFlow accuracy script for testing as it handles both the dataset download and accuracy check with a single line command? eg |
Thanks @anandhu-eng. We probably don't need the full accuracy log but say the accuracy log for 10 inputs and the expected accuracy metric value for that. That should be good enough to validate that the scripts are working fine. And yes, for the dependency part, we can make use of mlcflow. |
That seems doable, thanks @arjunsuresh How do we proceed on this? @SridharRambhatla would you be interested in doing the changes? If yes, I can help with whatever information you need. |
Hi @anandhu-eng and @arjunsuresh , Sure I can make the required changes. Could you please share the accuracy logs/other information which is needed? Also, I'll sign the MLCommons CLA too. Thanks! |
Hi @SridharRambhatla , In inference, we have an automation framework named MLCFlow which simplifies various stages of benchmark runs. When checking accuracy, we typically require the dataset, a set of libraries to run the accuracy checker, the accuracy log file, and sometimes the model itself. These can be considered the dependencies needed for performing an accuracy check. You can visualize MLC scripts as individual scripts, each handling a specific task (e.g., one for downloading the dataset, another for running the checker, etc.). These scripts can be reused and called as dependencies from other scripts. You can find how scripts are used as dependencies for the LLaMA 2 accuracy check here. Thanks to this modular design, users only need to call the top-level script to run the accuracy check. For example, refer to this usage. What needs to be done:
@arjunsuresh , I have two proposals:
Also, @SridharRambhatla — Discord might be a good option for further sync. You can join the Discord channel through this link. |
@anandhu-eng yes, we can skip the download model. But is there any advantage in having the github action under mlperf-automations repository? |
Hi @anandhu-eng, got it. I'll take the Looks like the channel invite is invalid now. I'm already a part of the MLCommons server on discord, maybe we can schedule a call whenever you're free? We can check out the approach once and I can clarify any questions to get this right. Thanks! |
Hi @arjunsuresh , following points make me aligned to keeping tests in
Sure @SridharRambhatla , how about we communicate through mail for that. Please e-mail at [email protected] |
@anandhu-eng sure. If you feel that adds a convenience, then that's fine. |
recheck |
For issue #1898
Added
.github/workflows/llm_accuracy_script_test.yml
that tests 4 LLM models:Script Improvements :
language/llama3.1-405b/evaluate-accuracy.py
- Added mock dataset supportlanguage/mixtral-8x7b/evaluate-accuracy.py
- Added mock dataset + error handlinglanguage/llama2-70b/evaluate-accuracy.py
- Fixed pandas importModels Tested
Testing
choco install act-cli
All tests pass successfully:
act -j test-llama3-accuracy
act -j test-mixtral-accuracy
act -j test-llama2-accuracy
act -j test-deepseek-accuracy