-
Notifications
You must be signed in to change notification settings - Fork 238
Fix loading jars with encoded paths #2759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
milesziemer
wants to merge
4
commits into
smithy-lang:main
Choose a base branch
from
milesziemer:jar-path-with-special-chars
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Fix loading jars with encoded paths #2759
milesziemer
wants to merge
4
commits into
smithy-lang:main
from
milesziemer:jar-path-with-special-chars
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes smithy-lang#2576 and smithy-lang#2103, which are both that the model assembler can't load jars if the jar's path has percent-encoded characters. This may happen for artifacts resolved by coursier (see linked issues). The model assembler can load Smithy files from a jar - model discovery. This works by looking for a 'META-INF/smithy/manifest' file within the jar, which contains the paths of smithy model files within the jar. You can give the model assembler a class loader to use for model discovery, but you can also add a jar to discover models from using a regular `addImport`. The problem is with the latter case. Because jars are like zip files, you can't read files within the jar using regular file system operations. Instead, you can either use the JarFile api to open the jar, and read specific entries (like a zip file), or you can directly read specific entries via a special URL with the format `jar:file:/path/to/jar.jar!/path/in/jar`. The latter is the way that our model discovery apis work. The problem was that URL doesn't perform any encoding or decoding itself, so callers/consumers have to take care of it, which we were not doing. For example, if the jar file has a path of `/foo/bar%45baz.jar`, we would create a URL like: `jar:file:/foo/bar%45baz.jar!/META-INF/smithy/manifest`, and when the JDK tried to use this URL to read from the file system, it would decode it and attempt to read `/foo/[email protected]`. Since we weren't encoding the URL to begin with, this was only a problem when the path contained `%`. I tried to preserve the existing model discovery api, and just make it work with these paths. The URL class has a section in its javadoc about this footgun, and in more recent jdk versions, URL constructors are deprecated in favor of `URI::toURL` to avoid it. So that's what I did. This technically changes the `SourceLocation::filename` for jars that are `addImport`ed, when the jar path has reserved characters (like spaces). Such jars were importable before because URL doesn't validate encoding, and the JDK decoding the URL would be a no-op since it wasn't encoded to begin with. So we could _just_ encode `%`, meaning only the newly-allowed paths would have any encoding. Disregarding the fact that this side-steps the assumed pre-conditions of URL, it would actually exacerbate an existing inconsistency between `SourceLocation::filename` of models discovered on the classpath, and models discovered from imports. The JDK encodes the URLs of resources on the classpath, so their models' source locations will be encoded. Encoding URLs of imported jars means there's no difference. You can assume that a source location with `jar:file:` is a well-formed URL/URI.
I wasn't accounting for the fact that windows paths have a different string representation than unix paths. This commit fixes that by normalizing the path to a URI path (using '/') before building the manifest URI. While debugging this, I found another difference between the source location filename of classpath and import discovered models. On windows, the source location of import discovered models will look like this: ``` jar:file:C:\foo\bar.jar!/META-INF/smithy/foo.smithy ``` which looks pretty cursed to me, but it _still works_. As in, you can do a `new URL(thatThing)`, and successfully read `foo.smithy`. I'm guessing the reason this works is similar to why you can read files when the URL isn't properly encoded, coincidental, or the JDK is doing some extra work to normalize it. Especially considering that you can do the same with the filename of classpath discovered models, which look like a normal URL: ``` jar:file:/C:/foo/bar.jar!/META-INF/smithy/foo.smithy ```
PluginContext was relying on being able to compare the source location filename with a stringified version of each configured source path, in order to figure out which shapes/metadata are part of sources. This was used by the sources plugin to generate the projected version of the source model, for non-source projections. In my last commit, I discovered that models in imported jars have a source location filename like: ``` jar:file:\path\to\... ``` on windows. So they could be compared to source paths by stripping the 'jar:file:' part. Since my changes changed the filename to be a proper url, this comparison no longer worked. I updated PluginContext to compare filenames like 'jar:file:' to the URI paths of sources, instead of the regular paths. These URI paths are computed eagerly, so we don't have to do it on every comparison. Usually there are a very low number of sources (they're usually a single directory). The fact that the specific format of the filename was being relied on is mildly concerning, but considering the fact that models discovered on the classpath already had a different format, I think this is ok. I would also be surprised if there's a lot of code out there manually importing jars, instead of providing a classloader.
Junit (or something, maybe not junit directly) seems to be failing to delete a temp dir I was creating with the TempDir annotation. Maybe just manually creating/deleting it will work.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
The model assembler can load Smithy files from a jar - model discovery. This works by looking for a 'META-INF/smithy/manifest' file within the jar, which contains the paths of smithy model files within the jar. You can give the model assembler a class loader to use for model discovery, but you can also add a jar to discover models from using a regular
addImport
. The problem is with the latter case. Because jars are like zip files, you can't read files within the jar using regular file system operations. Instead, you can either use the JarFile api to open the jar, and read specific entries (like a zip file), or you can directly read specific entries via a special URL with the formatjar:file:/path/to/jar.jar!/path/in/jar
. The latter is the way that our model discovery apis work. The problem was that URL doesn't perform any encoding or decoding itself, so callers/consumers have to take care of it, which we were not doing. For example, if the jar file has a path of/foo/bar%45baz.jar
, we would create a URL like:jar:file:/foo/bar%45baz.jar!/META-INF/smithy/manifest
, and when the JDK tried to use this URL to read from the file system, it would decode it and attempt to read/foo/[email protected]
. Since we weren't encoding the URL to begin with, this was only a problem when the path contained%
.I tried to preserve the existing model discovery api, and just make it work with these paths. The URL class has a section in its javadoc about this footgun, and in more recent jdk versions, URL constructors are deprecated in favor of
URI::toURL
to avoid it. So that's what I did.This technically changes the
SourceLocation::filename
for jars that areaddImport
ed, when the jar path has reserved characters (like spaces). Such jars were importable before because URL doesn't validate encoding, and the JDK decoding the URL would be a no-op since it wasn't encoded to begin with. So we could just encode%
, meaning only the newly-allowed paths would have any encoding. Disregarding the fact that this side-steps the assumed pre-conditions of URL, it would actually exacerbate an existing inconsistency betweenSourceLocation::filename
of models discovered on the classpath, and models discovered from imports. The JDK encodes the URLs of resources on the classpath, so their models' source locations will be encoded. Encoding URLs of imported jars means there's no difference. You can assume that a source location withjar:file:
is a well-formed URL/URI.Testing
Added new ModelAssembler tests for loading a jar via
addImport
when the jar's path has encoded characters.I added one for adding the import via path, and via URL, in case the way we normalize import paths ever changes.
I also added a ModelDiscovery test to assert the new encoding behavior.
Links
Fixes #2576 , Fixes #2103
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.