-
Notifications
You must be signed in to change notification settings - Fork 296
Add DepthAnythingV2. #2377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add DepthAnythingV2. #2377
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @james77777778, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces the DepthAnythingV2
model, significantly enhancing the library's capabilities in monocular depth estimation. It establishes a comprehensive framework for depth estimation tasks, building upon an updated DINOV2
backbone that now provides multi-scale feature representations. The inclusion of a specialized SiLog loss function and a generalized DepthEstimator
task class streamlines the development and deployment of depth prediction models, offering both relative and metric depth outputs.
Highlights
- New Model Integration: The
DepthAnythingV2
model has been integrated, significantly enhancing the library's capabilities in monocular depth estimation. - DINOV2 Backbone Update: The
DINOV2Backbone
has been updated to function as aFeaturePyramidBackbone
, allowing it to expose multi-scale feature representations crucial for advanced vision tasks like depth estimation. - Specialized Loss Function: The Scale-Invariant Logarithmic (SiLog) loss has been introduced, providing a robust and scale-invariant metric specifically designed for depth estimation tasks.
- Standardized Depth Estimation API: A new generic
DepthEstimator
task class and its associated preprocessor have been added, offering a standardized and streamlined API for developing and deploying depth prediction models.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the DepthAnythingV2 model, including its backbone, task-specific layers, loss function, and preprocessor. The changes also update the DINOV2 backbone to support feature pyramid outputs. The implementation is comprehensive and follows the repository structure well.
My review focuses on a few key areas:
- Documentation: Several new public classes and layers are missing docstrings, which is important for maintainability and user understanding.
- Correctness: There's a potential issue in the
DepthAnythingLoss
implementation regarding default parameter values which could lead to incorrect behavior orNaN
s during training.
Overall, this is a great contribution. Addressing these points will improve the quality and robustness of the new model.
445aa30
to
95e8d84
Compare
fa10204
to
931663a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small nits, I guess we need to convert checkpoints and upload again for DinoV2.
Also, convert_depth_anything is missing to convert HF checkpoints.
4b673d9
to
bf73164
Compare
I don't think we need to re-upload the DINOV2 presets since only the arch was modified, not the weights. The newly added
Do we want to convert HF checkpoints for DepthAnything on-the-fly? I can add this as a follow-up. |
Yes, conversion script would be nice to have for all the models we converted from HF. |
Description of the change
This PR includes:
FeaturePyramidBackbone
inDepthAnythingV2
.DepthAnythingV2
model arch.DepthAnythingV2
loss (SiLog loss).DepthEstimator
task class.Here is a colab demonstrating both zero-shot inference with
DepthAnythingV2
and an end-to-end fine-tuning example:https://colab.research.google.com/drive/1bk-bYkiYtUkzltIJcKljGs6_JyN_dhcX?usp=sharing
Reference
Colab Notebook
https://colab.research.google.com/drive/1bk-bYkiYtUkzltIJcKljGs6_JyN_dhcX?usp=sharing
Checklist