Animalkingdom pr #833

luciehmct · 2025-09-19T16:17:44Z

Summary

Adds the AnimalKingdom evaluation suite, structured in parallel to MammAlps with three subtasks (animal, action, activity recognition).
Converts _cot.jsonl annotations into Hugging Face–style datasets via the same dataset_builder.py used for MammAlps (--dataset animalkingdom).
Reuses the helper utilities (doc_to_visual, doc_to_text, doc_to_target, process_results) and strict Jaccard scoring/aggregation pipeline introduced for MammAlps.
Provides InternVL3 evaluation commands that mirror the MammAlps setup (OpenGVLab/InternVL3-8B, batch size 1, num_frame=32, use_temporal_context=True).

Details

Dataset specifics: AnimalKingdom derives from luciehmct/animalkingdom-test, covering diverse YouTube wildlife clips. While MammAlps comes from alpine camera-trap footage, both datasets are standardized by the same builder, ensuring consistent JSON records and folder layouts.
Evaluation flow:
- Subtask configs (animalkingdom_animal.yaml, animalkingdom_action.yaml, animalkingdom_activity.yaml) are direct analogs of the MammAlps configs.
- Logs results under results/<model>_<timestamp>/animalkingdom_<subtask>.jsonl with prompt, response, parsed predictions, ground truth, and per-example Jaccard score.
- Relies on the globally registered Jaccard metric in lmms_eval/api/metrics.py, identical to MammAlps.
InternVL3 integration: Same model setup as MammAlps (OpenGVLab/InternVL3-8B, batch size 1). The use_temporal_context=True flag enriches prompts with frame timestamps, exactly as documented for MammAlps.

Testing

# Example: Action recognition
python -m lmms_eval \
  --model internvl3 \
  --model_args "pretrained=OpenGVLab/InternVL3-8B,modality=video,num_frame=32,use_temporal_context=True" \
  --tasks animalkingdom_action \
  --batch_size 1 \
  --output_path "$OUT_DIR"

# Run all three AnimalKingdom subtasks together
python -m lmms_eval \
  --model internvl3 \
  --model_args "pretrained=OpenGVLab/InternVL3-8B,modality=video,num_frame=32,use_temporal_context=True" \
  --tasks animalkingdom \
  --batch_size 1 \
  --output_path "$OUT_DIR"

…lemonade_eval

… sampling

…nition on wildlife videos - Implements animal, action, and activity recognition subtasks for the AnimalKingdom dataset - Includes dataset builder, YAML configs, and strict Jaccard metric evaluation - Utilities for prompt/answer extraction, result processing, and HuggingFace video download - See animalkingdom/README.md for details and usage instructions

kcz358 · 2025-09-26T02:22:29Z

lmms_eval/tasks/animalkingdom/utils.py

+    try:
+        from huggingface_hub import hf_hub_download
+
+        # Download the video file from the HuggingFace dataset repository
+        local_path = hf_hub_download(repo_id="luciehmct/animalkingdom-test", filename=clip_path, repo_type="dataset")
+
+        if os.path.exists(local_path):
+            return [local_path]
+        else:
+            eval_logger.error(f"Downloaded file does not exist: {local_path}")
+            return [clip_path]
+
+    except Exception as e:
+        eval_logger.error(f"Failed to download video {clip_path}: {str(e)}")
+        return [clip_path]


This part is a bit hacky. Should not really get clip path like this

kcz358

Hi, I scan through some part of the PR. The question remains as most of your previous PR.

kcz358 · 2025-09-26T02:30:36Z

I think the major problem is the path and the configuration though. Maybe for most of the PR, you can again check what is the recommended way to pass the video path through doc_to_visual.

TashkovskaMatea and others added 13 commits July 5, 2025 20:59

Video loader with caching and download

139a2b4

Video loader with caching and download

fe24c04

Merge branch 'lemonade_eval' of github.com:amathislab/lmms-eval into …

6a3ef2d

…lemonade_eval

black and isort formating

99af1df

clean imports

34d435f

Video loader with caching and download

a9caecd

black and isort formating

87cf67a

clean imports

0391cd4

Merge branch 'main' of github.com:amathislab/lmms-eval into main

c8c8d3c

implement coderabbitai comments

060935d

download data in cache

50d70da

InternVL3: register model, timestamp for prompt, uniform and adaptive…

2743e65

… sampling

kcz358 reviewed Sep 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Animalkingdom pr #833

Animalkingdom pr #833

Uh oh!

luciehmct commented Sep 19, 2025

Uh oh!

kcz358 Sep 26, 2025

Uh oh!

kcz358 left a comment

Uh oh!

kcz358 commented Sep 26, 2025

Uh oh!

Uh oh!

Animalkingdom pr #833

Are you sure you want to change the base?

Animalkingdom pr #833

Uh oh!

Conversation

luciehmct commented Sep 19, 2025

Summary

Details

Testing

Uh oh!

kcz358 Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 left a comment

Choose a reason for hiding this comment

Uh oh!

kcz358 commented Sep 26, 2025

Uh oh!

Uh oh!