-
Notifications
You must be signed in to change notification settings - Fork 720
Issues related to the DownloadUtils vulnerability #3757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Issues related to the DownloadUtils vulnerability #3757
Conversation
Validation of url and output
[Corrected] Validation of url and output
Thanks for your contribution. However, the DownloadUtils is a utility function we provide for users to download model files, etc. It's only used in example code, unit tests, it's not used in production server. So this is not an vulnerability. |
@siriusbellatrix Could you please look at the build failures here: https://github.com/deepjavalibrary/djl/actions/runs/16778679781/job/47517204012?pr=3757#step:16:69 |
[add import] import java.net.MalformedURLException; import java.nio.file.InvalidPathException;
It seems the build failure was caused by a missing import statement. I've just added the necessary imports and committed the change. |
private static void validateUrl(URL url) throws IOException { | ||
String urlString = url.toString(); | ||
// Allow only URLs that start with the specific domain. | ||
if (!urlString.startsWith("https://djl-ai.s3.amazonaws.com/")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit tests failed with error:
java.net.MalformedURLException: URL is not from the allowed domain: https://resources.djl.ai/audios/test_01.wav
We should allow downloading from any urls, like pytorch, tensorflow, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code is a sample. If we can specify the path from which downloads are allowed, it's best to restrict it using a URL whitelist. If that's not possible, then we can limit the downloadable files by file extension or file size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is necessary
private static void validateUrl(URL url) throws IOException { | ||
String urlString = url.toString(); | ||
// Allow only URLs that start with the specific domain. | ||
if (!urlString.startsWith("https://djl-ai.s3.amazonaws.com/")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot limit the url
- in our Jupyter notebook, we use DownloadUtils to download images from many different location
- If our user use Jupyter notebook, they should be able to use
DownloadUtils
do anything they want to
It's up to caller to ensure the URL is validate, not the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's understandable that DownloadUtils needs to support a variety of URLs,
that doesn't mean it should allow any URL without restriction.
As a shared utility used by many developers, the library should enforce at least minimal security validation to reduce the risk of misuse or accidental security issues.
Placing the entire burden of validation on the caller can lead to security vulnerabilities and may compromise trust in the library itself. Instead, it's more robust for the library to offer safe defaults, while still allowing flexibility when explicitly needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a feature we provide to our developers. It's a security vulnerability if our user misused this class, not the class itself. Which is the same for FileUtils
or even java.nio.Files
class, You cannot ask java.nio.Files
to do the validation on the file path.
It's application's responsibility to have a whitelisted restriction, we don't know that.
The reason we introduce this class is for Jupyter notebook user to download any files to any location (with progress bar). Anding restriction defeat the purpose of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The claim that DownloadUtils cannot restrict extensions, size, or path appears to overlook the importance of security. I also believe the comparison to basic classes like FileUtils or java.nio.Files is inappropriate.
java.nio.Files is a general-purpose API that provides fundamental file system access. It's designed to give developers complete control for a wide range of environments and purposes, so it's reasonable that it doesn't include specific logic like path validation. In contrast, DownloadUtils is a function created for a specific purpose (model downloading). Therefore, I believe this class has a responsibility to proactively prevent potential security threats.
I believe implementing secure coding with the following methods would not cause any issues for developers using this functionality.
1. Extension Restriction: I understand that model files use specific extensions like .pt, .h5, and .onnx. A whitelist approach that allows only these few extensions would fully serve the purpose of DownloadUtils while effectively blocking the download of malicious files, such as potential webshells like .jsp or .php, or even .exe and .sh files depending on the context.
2. Download Path Restriction: Allowing download paths to be vulnerable to directory traversal attacks with ../ is extremely risky. If DownloadUtils restricted the download path to a specific root directory, it could prevent attacks that attempt to overwrite system files or access sensitive data. No developer would want unintended files saved in critical system folders like C:\Windows or /etc/.
These restrictions maintain the core functionality of DownloadUtils, while serving as essential measures that help developers use the class securely and with confidence.
Finally, the argument that "we don't impose restrictions because we don't know how the application will be implemented" is not much different from arguing that all libraries should ignore security. I believe library developers have a responsibility to provide secure by default functionality, ensuring users can use their tools with confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not meant for download model, DJL download model using Criteria
and Repository
class.
This class is designed to download arbitrary files for Jupyter notebook usage (per jupyter notebook user's request). It is designed for general purpose file download for Jupyter notebook users.
It just happened to be used in unit test code to download some testing data. There is no usage of this class in DJL own production code at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that the assumption is that DownloadUtils will be used directly by developers in Jupyter Notebook.
However, how a library is actually used is unpredictable. Some developers may integrate it into a server application accessible to many users, while others may deploy it in an internal-only environment.
Therefore, assuming that all developers will only use it directly in Jupyter Notebook is not realistic,
and secure default settings should be considered to ensure safety across different environments.
java.nio.Files and Apache FileUtils are general-purpose APIs designed for local file system operations such as copying, moving, or deleting files. Their design inherently allows working with “any file” and does not involve handling external network input.
In contrast, DownloadUtils accepts external input and downloads files from remote servers to the local file system. This introduces a much larger attack surface, including risks such as Remote File Download (RFD), malicious file placement, and path traversal attacks.
Therefore, comparing DownloadUtils to Apache FileUtils or java.nio.Files—and justifying the lack of security checks on that basis—is not appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- FileUtils is not only dealing with local files, it can download arbitrary files: copyURLToFile.
- DJL is not using this class internally, even developer include it in their application, developer must do input validation no matter calling
DownloadUtils
or not, that's their responsibility. - DownloadUtils name clearly indicated this class is used for downloading files. When user calling it, they don't expect restrictions.
- I didn't find any better way to expose an general purpose
curl
like command for Jupyter notebook (Java kernel) user. We can remove this file, but Jupyter notebook users (and all our demos) will be suffered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion seems to be going in circles. My opinion remains the same: if a feature is needed, I believe it is the library developer’s responsibility to provide at least a minimum level of security to the developer.
I think it is right to implement at least some basic security measures (secure coding). As I mentioned earlier, no developer would intentionally save a model in critical system folders like /shadow/ or /drivers/ for model training, and no developer would want files with potentially malicious extensions like .jsp or .exe to be downloaded. In that case, couldn’t at least a whitelist be provided for these?
The developer keeps insisting that the person calling this function is fully responsible for input validation. But if we follow that logic, then there wouldn’t have been any reason to release a security update for the 2021 Log4j vulnerability, would there?
Why do so many forum-style libraries restrict major scripts with a whitelist to prevent XSS, while still providing the source code to developers? Shouldn’t they allow all scripts since developers might need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I don't think this is a security vulnerability. If DJL use this class internally and didn't do input validation, and can cause unexpected behavior (log4j case), then it's a vulnerability, but clearly this is NOT the case. If the side effect is not obvious and can cause damage (XSS case), then we should consider to improve it. But DownloadUtils
api is very obvious.
If we release a jar that only contains DownloadUtils
, do you still think this is a security vulnerability?
We have a lot of powerful features, like load a shared library from arbitrary location, execute arbitrary code (model is code), even reboot GPU. It's up to developer to ensure a trusted environment. If we follow your logic, all those features itself is security vulnerability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you believe that the use of external input values is the responsibility of the caller or developer, and that this is the intended purpose of the design, I don't think further discussion will convince you that this is a vulnerability.
Although the official documentation (https://docs.djl.ai/master/docs/demos/jupyter/load_pytorch_model.html) does not state that DownloadUtils is intended exclusively for Jupyter environments or that it is a part of test code, I can only hope that all developers or callers using DownloadUtils will use that code in a Jupyter environment as you've intended, and that they are safe and have no malicious intent.
private static void validatePath(Path output) throws IOException { | ||
try { | ||
// Normalize the path and check for ".." to prevent directory traversal attacks. | ||
if (output.normalize().toString().contains("..")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This validation is not enough, should block absolute path as well if we really want to
- We have test cases (in other repository) write file to tmp directory, this will break the existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For security reasons, it is advisable to restrict the use of absolute paths by default.
If absolute paths are required in specific cases, it is safer and clearer to handle those cases with targeted code modifications or exceptions.
Rather than relaxing restrictions globally, prioritizing security by default and allowing exceptions only when necessary is a better approach in terms of maintainability and system integrity.
Description
Hello, I'm the person who reported the related vulnerability on huntr.com.
As you mentioned, due to the if (Files.exists(output)) line, existing files are indeed not overwritten.
I apologize for the incorrect information in my initial report.
However, since there is no validation for the url and output parameters, an attacker can still create arbitrary files in arbitrary locations. The overwrite prevention only limits part of the attack. This still allows for malicious files to be placed on the server or for large files to be downloaded, potentially leading to denial-of-service conditions. Additionally, depending on the server's configuration, an attacker could download a malicious file to execute a web shell.
I just finished updating and resubmitting the issue I previously reported on huntr.com.
I would appreciate it if you could review this issue once again.