-
Notifications
You must be signed in to change notification settings - Fork 720
Issues related to the DownloadUtils vulnerability #3757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,9 @@ | |
import java.nio.file.StandardCopyOption; | ||
import java.util.zip.GZIPInputStream; | ||
|
||
import java.net.MalformedURLException; | ||
import java.nio.file.InvalidPathException; | ||
|
||
/** A utility class downloads the file from specified url. */ | ||
public final class DownloadUtils { | ||
|
||
|
@@ -61,9 +64,17 @@ public static void download(String url, String output, Progress progress) throws | |
* @throws IOException when IO operation fails in downloading | ||
*/ | ||
public static void download(URL url, Path output, Progress progress) throws IOException { | ||
|
||
// 1. Validate the URL to ensure it's from the allowed domain. | ||
validateUrl(url); | ||
|
||
// 2. Validate the output path to prevent directory traversal attacks. | ||
validatePath(output); | ||
|
||
if (Files.exists(output)) { | ||
return; | ||
} | ||
|
||
Path dir = output.toAbsolutePath().getParent(); | ||
if (dir != null) { | ||
Files.createDirectories(dir); | ||
|
@@ -86,6 +97,42 @@ public static void download(URL url, Path output, Progress progress) throws IOEx | |
} | ||
} | ||
|
||
/** | ||
* [Sample] Validates the safety and validity of a URL. | ||
* It now specifically checks if the URL starts with the allowed domain. | ||
* @param url The URL object to validate. | ||
* @throws IOException If the URL is not from the allowed domain or is unsafe. | ||
*/ | ||
private static void validateUrl(URL url) throws IOException { | ||
String urlString = url.toString(); | ||
// Allow only URLs that start with the specific domain. | ||
if (!urlString.startsWith("https://djl-ai.s3.amazonaws.com/")) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The unit tests failed with error:
We should allow downloading from any urls, like pytorch, tensorflow, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That code is a sample. If we can specify the path from which downloads are allowed, it's best to restrict it using a URL whitelist. If that's not possible, then we can limit the downloadable files by file extension or file size. |
||
throw new MalformedURLException("URL is not from the allowed domain: " + urlString); | ||
} | ||
|
||
// Additional check to prevent local file access, just in case. | ||
if ("file".equalsIgnoreCase(url.getProtocol())) { | ||
throw new MalformedURLException("Local file URLs are not allowed."); | ||
} | ||
} | ||
|
||
/** | ||
* [Sample] Validates the safety and validity of an output path. | ||
* It prevents directory traversal attacks by checking for parent directory access (e.g., ../). | ||
* @param output The Path object to validate. | ||
* @throws IOException If the path is invalid or unsafe. | ||
*/ | ||
private static void validatePath(Path output) throws IOException { | ||
try { | ||
// Normalize the path and check for ".." to prevent directory traversal attacks. | ||
if (output.normalize().toString().contains("..")) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For security reasons, it is advisable to restrict the use of absolute paths by default. Rather than relaxing restrictions globally, prioritizing security by default and allowing exceptions only when necessary is a better approach in terms of maintainability and system integrity. |
||
throw new InvalidPathException(output.toString(), "Path traversal attempt detected."); | ||
} | ||
} catch (InvalidPathException e) { | ||
throw new IOException("Invalid output path: " + output.toString(), e); | ||
} | ||
} | ||
|
||
private static final class ProgressInputStream extends InputStream { | ||
|
||
private InputStream is; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot limit the url
DownloadUtils
do anything they want toIt's up to caller to ensure the URL is validate, not the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's understandable that DownloadUtils needs to support a variety of URLs,
that doesn't mean it should allow any URL without restriction.
As a shared utility used by many developers, the library should enforce at least minimal security validation to reduce the risk of misuse or accidental security issues.
Placing the entire burden of validation on the caller can lead to security vulnerabilities and may compromise trust in the library itself. Instead, it's more robust for the library to offer safe defaults, while still allowing flexibility when explicitly needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a feature we provide to our developers. It's a security vulnerability if our user misused this class, not the class itself. Which is the same for
FileUtils
or evenjava.nio.Files
class, You cannot askjava.nio.Files
to do the validation on the file path.It's application's responsibility to have a whitelisted restriction, we don't know that.
The reason we introduce this class is for Jupyter notebook user to download any files to any location (with progress bar). Anding restriction defeat the purpose of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The claim that DownloadUtils cannot restrict extensions, size, or path appears to overlook the importance of security. I also believe the comparison to basic classes like FileUtils or java.nio.Files is inappropriate.
java.nio.Files is a general-purpose API that provides fundamental file system access. It's designed to give developers complete control for a wide range of environments and purposes, so it's reasonable that it doesn't include specific logic like path validation. In contrast, DownloadUtils is a function created for a specific purpose (model downloading). Therefore, I believe this class has a responsibility to proactively prevent potential security threats.
I believe implementing secure coding with the following methods would not cause any issues for developers using this functionality.
1. Extension Restriction: I understand that model files use specific extensions like .pt, .h5, and .onnx. A whitelist approach that allows only these few extensions would fully serve the purpose of DownloadUtils while effectively blocking the download of malicious files, such as potential webshells like .jsp or .php, or even .exe and .sh files depending on the context.
2. Download Path Restriction: Allowing download paths to be vulnerable to directory traversal attacks with ../ is extremely risky. If DownloadUtils restricted the download path to a specific root directory, it could prevent attacks that attempt to overwrite system files or access sensitive data. No developer would want unintended files saved in critical system folders like C:\Windows or /etc/.
These restrictions maintain the core functionality of DownloadUtils, while serving as essential measures that help developers use the class securely and with confidence.
Finally, the argument that "we don't impose restrictions because we don't know how the application will be implemented" is not much different from arguing that all libraries should ignore security. I believe library developers have a responsibility to provide secure by default functionality, ensuring users can use their tools with confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is not meant for download model, DJL download model using
Criteria
andRepository
class.This class is designed to download arbitrary files for Jupyter notebook usage (per jupyter notebook user's request). It is designed for general purpose file download for Jupyter notebook users.
It just happened to be used in unit test code to download some testing data. There is no usage of this class in DJL own production code at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that the assumption is that DownloadUtils will be used directly by developers in Jupyter Notebook.
However, how a library is actually used is unpredictable. Some developers may integrate it into a server application accessible to many users, while others may deploy it in an internal-only environment.
Therefore, assuming that all developers will only use it directly in Jupyter Notebook is not realistic,
and secure default settings should be considered to ensure safety across different environments.
java.nio.Files and Apache FileUtils are general-purpose APIs designed for local file system operations such as copying, moving, or deleting files. Their design inherently allows working with “any file” and does not involve handling external network input.
In contrast, DownloadUtils accepts external input and downloads files from remote servers to the local file system. This introduces a much larger attack surface, including risks such as Remote File Download (RFD), malicious file placement, and path traversal attacks.
Therefore, comparing DownloadUtils to Apache FileUtils or java.nio.Files—and justifying the lack of security checks on that basis—is not appropriate.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DownloadUtils
or not, that's their responsibility.curl
like command for Jupyter notebook (Java kernel) user. We can remove this file, but Jupyter notebook users (and all our demos) will be suffered.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion seems to be going in circles. My opinion remains the same: if a feature is needed, I believe it is the library developer’s responsibility to provide at least a minimum level of security to the developer.
I think it is right to implement at least some basic security measures (secure coding). As I mentioned earlier, no developer would intentionally save a model in critical system folders like /shadow/ or /drivers/ for model training, and no developer would want files with potentially malicious extensions like .jsp or .exe to be downloaded. In that case, couldn’t at least a whitelist be provided for these?
The developer keeps insisting that the person calling this function is fully responsible for input validation. But if we follow that logic, then there wouldn’t have been any reason to release a security update for the 2021 Log4j vulnerability, would there?
Why do so many forum-style libraries restrict major scripts with a whitelist to prevent XSS, while still providing the source code to developers? Shouldn’t they allow all scripts since developers might need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I don't think this is a security vulnerability. If DJL use this class internally and didn't do input validation, and can cause unexpected behavior (log4j case), then it's a vulnerability, but clearly this is NOT the case. If the side effect is not obvious and can cause damage (XSS case), then we should consider to improve it. But
DownloadUtils
api is very obvious.If we release a jar that only contains
DownloadUtils
, do you still think this is a security vulnerability?We have a lot of powerful features, like load a shared library from arbitrary location, execute arbitrary code (model is code), even reboot GPU. It's up to developer to ensure a trusted environment. If we follow your logic, all those features itself is security vulnerability.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you believe that the use of external input values is the responsibility of the caller or developer, and that this is the intended purpose of the design, I don't think further discussion will convince you that this is a vulnerability.
Although the official documentation (https://docs.djl.ai/master/docs/demos/jupyter/load_pytorch_model.html) does not state that DownloadUtils is intended exclusively for Jupyter environments or that it is a part of test code, I can only hope that all developers or callers using DownloadUtils will use that code in a Jupyter environment as you've intended, and that they are safe and have no malicious intent.