Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/current_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ python -m lmms_eval --tasks list_with_num
- egoschema_mcppl
- egoschema_subset_mcppl
- egoschema_subset
- [LEMONADE](https://huggingface.co/datasets/amathislab/LEMONADE) (lemonade)
- [LongVideoBench](https://github.com/longvideobench/LongVideoBench)
- [MovieChat](https://github.com/rese1f/MovieChat) (moviechat)
- Global Mode for entire video (moviechat_global)
Expand Down
45 changes: 45 additions & 0 deletions lmms_eval/tasks/lemonade/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# LEMONADE

## Task Description

**LEMONADE** (Language models Evaluation of MOtion aNd Action-Driven Enquiries) is a QA benchmark extracted from the **EPFL-Smart-Kitchen-30** dataset (see [arXiv](https://arxiv.org/abs/2506.01608)). It consists of **36,521 closed-ended QA pairs** linked to egocentric video clips.

Questions are organized into three groups and six subcategories:

- **Behavior Understanding**
- *Perception*: recognizing perceived actions
- *Reasoning*: reasoning over unseen behaviors
- **Long-term Understanding**
- *Summarization*: summarizing over longer clips
- *Session Properties*: inferring session-level information
- **Motion & Biomechanics**
- *Physical Attributes*: inferring hand shapes, joint angles, etc.
- *Kinematics*: inferring trajectory velocities

The benchmark was evaluated using **`lmms-eval`** in the associated publication.


## Implementation

- **utils.py**: Handles data loading from Hugging Face, video loading, answer parsing, and metric evaluation.
- **lemonade.yaml**: Contains the default prompts and evaluation settings.

When running LEMONADE through `lmms-eval`, the data is automatically downloaded. For direct dataset access, please refer to [Hugging Face](https://huggingface.co/datasets/amathislab/LEMONADE) or [Zenodo](https://zenodo.org/records/15535461).

Performance is evaluated in terms of accuracy against the ground truth, with results reported overall as well as per category and subcategory.

## Citation

If you use **LEMONADE**, please cite:

```bibtex
@misc{bonnetto2025epflsmartkitchen,
title={EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models},
author={Andy Bonnetto and Haozhe Qi and Franklin Leong and Matea Tashkovska and Mahdi Rad and Solaiman Shokur and Friedhelm Hummel and Silvestro Micera and Marc Pollefeys and Alexander Mathis},
year={2025},
eprint={2506.01608},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.01608},
}
```
28 changes: 28 additions & 0 deletions lmms_eval/tasks/lemonade/lemonade.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
dataset_path: amathislab/LEMONADE
dataset_kwargs:
video: true
cache_dir: lemonade_data
force_unzip: true
task: "lemonade"
test_split: test
output_type: generate_until
doc_to_visual: !function utils.lemonade_doc_to_visual
doc_to_text: !function utils.lemonade_doc_to_text
doc_to_target: "Correct Answer"

generation_kwargs:
max_new_tokens: 128
temperature: 0
do_sample: false

process_results: !function utils.lemonade_process_results
metric_list:
- metric: acc
aggregation: !function utils.lemonade_aggregate_results
higher_is_better: true

lmms_eval_specific_kwargs:
default:
pre_prompt: "Answer the following multiple-choice question using the given images.\n"
post_prompt: "\nRespond only with the letter of the correct answer."
max_num_frames: 8
Loading
Loading