Skip to content

Conversation

luciehmct
Copy link

Summary

  • Adds the AnimalKingdom evaluation suite, structured in parallel to MammAlps with three subtasks (animal, action, activity recognition).
  • Converts _cot.jsonl annotations into Hugging Face–style datasets via the same dataset_builder.py used for MammAlps (--dataset animalkingdom).
  • Reuses the helper utilities (doc_to_visual, doc_to_text, doc_to_target, process_results) and strict Jaccard scoring/aggregation pipeline introduced for MammAlps.
  • Provides InternVL3 evaluation commands that mirror the MammAlps setup (OpenGVLab/InternVL3-8B, batch size 1, num_frame=32, use_temporal_context=True).

Details

  • Dataset specifics: AnimalKingdom derives from luciehmct/animalkingdom-test, covering diverse YouTube wildlife clips. While MammAlps comes from alpine camera-trap footage, both datasets are standardized by the same builder, ensuring consistent JSON records and folder layouts.
  • Evaluation flow:
    • Subtask configs (animalkingdom_animal.yaml, animalkingdom_action.yaml, animalkingdom_activity.yaml) are direct analogs of the MammAlps configs.
    • Logs results under results/<model>_<timestamp>/animalkingdom_<subtask>.jsonl with prompt, response, parsed predictions, ground truth, and per-example Jaccard score.
    • Relies on the globally registered Jaccard metric in lmms_eval/api/metrics.py, identical to MammAlps.
  • InternVL3 integration: Same model setup as MammAlps (OpenGVLab/InternVL3-8B, batch size 1). The use_temporal_context=True flag enriches prompts with frame timestamps, exactly as documented for MammAlps.

Testing

# Example: Action recognition
python -m lmms_eval \
  --model internvl3 \
  --model_args "pretrained=OpenGVLab/InternVL3-8B,modality=video,num_frame=32,use_temporal_context=True" \
  --tasks animalkingdom_action \
  --batch_size 1 \
  --output_path "$OUT_DIR"

# Run all three AnimalKingdom subtasks together
python -m lmms_eval \
  --model internvl3 \
  --model_args "pretrained=OpenGVLab/InternVL3-8B,modality=video,num_frame=32,use_temporal_context=True" \
  --tasks animalkingdom \
  --batch_size 1 \
  --output_path "$OUT_DIR"

TashkovskaMatea and others added 13 commits July 5, 2025 20:59
…nition on wildlife videos

- Implements animal, action, and activity recognition subtasks for the AnimalKingdom dataset
- Includes dataset builder, YAML configs, and strict Jaccard metric evaluation
- Utilities for prompt/answer extraction, result processing, and HuggingFace video download
- See animalkingdom/README.md for details and usage instructions
Comment on lines +70 to +84
try:
from huggingface_hub import hf_hub_download

# Download the video file from the HuggingFace dataset repository
local_path = hf_hub_download(repo_id="luciehmct/animalkingdom-test", filename=clip_path, repo_type="dataset")

if os.path.exists(local_path):
return [local_path]
else:
eval_logger.error(f"Downloaded file does not exist: {local_path}")
return [clip_path]

except Exception as e:
eval_logger.error(f"Failed to download video {clip_path}: {str(e)}")
return [clip_path]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a bit hacky. Should not really get clip path like this

Copy link
Collaborator

@kcz358 kcz358 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I scan through some part of the PR. The question remains as most of your previous PR.

@kcz358
Copy link
Collaborator

kcz358 commented Sep 26, 2025

I think the major problem is the path and the configuration though. Maybe for most of the PR, you can again check what is the recommended way to pass the video path through doc_to_visual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants