Skip to content

Implement DINOv3 #2365

@BhavyaGoyal777

Description

@BhavyaGoyal777

I am interested in contributing to Keras by implementing DINOv3 (Distillation with No labels v3), a state-of-the-art self-supervised Vision Transformer model, as an example/tutorial. Before proceeding, I would like to confirm if this aligns with the project's goals and if there are any existing implementations or guidelines I should be aware of.

Why DINOv3?

  • State-of-the-art performance: DINOv3 achieves top-tier results on various vision tasks without requiring labeled data, making it a valuable addition to Keras examples.
  • Versatility: It serves as a strong backbone for tasks like image classification, segmentation, and object detection.
  • Alignment with Keras 3: Given Keras 3's multi-backend support (TensorFlow, JAX, PyTorch), implementing DINOv3 would showcase the framework's flexibility.

Implementation Plan:

  • Model Architecture: Implement the Vision Transformer (ViT) backbone with self-supervised learning using DINOv3.
  • Training: Utilize standard datasets such as CIFAR-10 or ImageNet for training.
  • Backend Compatibility: Ensure the implementation is compatible with TensorFlow, JAX, and PyTorch backends.
  • Documentation: Provide clear instructions on how to use the model, including training and evaluation scripts.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions