-
Notifications
You must be signed in to change notification settings - Fork 80
Python Package Version Extraction from URIs in pyproject.toml (PEP508) #1525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements parsing for PEP 508 compliant URI dependencies in pyproject.toml
files to extract both name and version information from direct references including wheel files, archives, and VCS repositories.
- Adds regex-based version extraction from URLs in PythonDependencyTransformer
- Enhances pyproject.toml parsing to handle direct URI dependencies alongside traditional version constraints
- Includes comprehensive test coverage for various dependency formats
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
PythonDependencyTransformer.java | Adds PEP 508 URI parsing logic with regex patterns for extracting versions from URLs |
PythonDependencyTransformerTest.java | New test file covering normal dependencies and PEP 508 URI formats |
PyprojectTomlParserTest.java | New test file validating complex pyproject.toml parsing with mixed dependency types |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
private static final List<String> TOKEN_IGNORE_AFTER_CHARS = Arrays.asList(",", "[", "==", ">=", "~=", "<=", ">", "<"); | ||
private static final Pattern URI_VERSION_PATTERN = Pattern.compile(".*/([A-Za-z0-9_.-]+)-([0-9]+(?:\\.[0-9A-Za-z_-]+)*).*\\.(whl|zip|tar\\.gz|tar\\.bz2|tar)$"); | ||
private static final Pattern VCS_VERSION_PATTERN = Pattern.compile(".*@([0-9]+(?:\\.[0-9]+)*(?:[A-Za-z0-9._-]*)?).*"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These regex patterns are complex and lack documentation. Consider adding inline comments explaining what each pattern matches and providing examples of URLs they're designed to parse.
private static final List<String> TOKEN_IGNORE_AFTER_CHARS = Arrays.asList(",", "[", "==", ">=", "~=", "<=", ">", "<"); | |
private static final Pattern URI_VERSION_PATTERN = Pattern.compile(".*/([A-Za-z0-9_.-]+)-([0-9]+(?:\\.[0-9A-Za-z_-]+)*).*\\.(whl|zip|tar\\.gz|tar\\.bz2|tar)$"); | |
private static final Pattern VCS_VERSION_PATTERN = Pattern.compile(".*@([0-9]+(?:\\.[0-9]+)*(?:[A-Za-z0-9._-]*)?).*"); | |
private static final List<String> TOKEN_IGNORE_AFTER_CHARS = Arrays.asList(",", "[", "==", ">=", "~=", "<=", ">", "<"); | |
// Matches package filenames in URIs, extracting the package name and version. | |
// Example: https://files.pythonhosted.org/packages/.../requests-2.25.1-py2.py3-none-any.whl | |
// Captures: "requests" as name, "2.25.1" as version | |
private static final Pattern URI_VERSION_PATTERN = Pattern.compile(".*/([A-Za-z0-9_.-]+)-([0-9]+(?:\\.[0-9A-Za-z_-]+)*).*\\.(whl|zip|tar\\.gz|tar\\.bz2|tar)$"); | |
// Matches VCS (Version Control System) URIs with an @version suffix. | |
// Example: git+https://github.com/psf/[email protected] | |
// Captures: "v2.25.1" as version | |
private static final Pattern VCS_VERSION_PATTERN = Pattern.compile(".*@([0-9]+(?:\\.[0-9]+)*(?:[A-Za-z0-9._-]*)?).*"); | |
// Matches archive or release URLs, extracting the version from the path. | |
// Example: https://github.com/psf/requests/archive/2.25.1.zip | |
// Captures: "2.25.1" as version |
Copilot uses AI. Check for mistakes.
Matcher matcher = URI_VERSION_PATTERN.matcher(uri); | ||
if (matcher.find()) { | ||
return matcher.group(2); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 2
refers to the second capture group. Consider using a named constant like VERSION_GROUP_INDEX = 2
to make the code more self-documenting.
Copilot uses AI. Check for mistakes.
Matcher vcsMatcher = VCS_VERSION_PATTERN.matcher(uri); | ||
if (vcsMatcher.find()) { | ||
return vcsMatcher.group(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 1
refers to the first capture group. Consider using a named constant like VCS_VERSION_GROUP_INDEX = 1
to make the code more self-documenting.
Copilot uses AI. Check for mistakes.
Matcher archiveMatcher = ARCHIVE_VERSION_PATTERN.matcher(uri); | ||
if (archiveMatcher.find()) { | ||
return archiveMatcher.group(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 1
refers to the first capture group. Consider using a named constant like ARCHIVE_VERSION_GROUP_INDEX = 1
to make the code more self-documenting.
Copilot uses AI. Check for mistakes.
class PythonDependencyTransformerTest { | ||
|
||
@Test | ||
void testTransformLine() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see unit tests for this complex parsing.
There are many cases being tested by this test. It would be a bit nicer if it was implemented using @ParameterizedTest
. Otherwise, if there's ever a problem and the test starts to fail, only one failing assertion will be reported. With a parameterized test it becomes more clear what other types of cases might be broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion. I've updated the test with @ParameterizedTest
.
JIRA Ticket
IDETECT-4845
Description
This is a merge request that implements parsing for dependencies declared via PEP508 compliant URIs in
pyproject.toml
, ensuring versions from wheel files, archives, and VCS refs can be extracted.Example of supported PEP 508 URI dependency:
Impact Areas
pip install
usingpyproject.toml
, with graceful fallback to the next detector if version is missing, malformed, or URI invalid.pyproject.toml
. Reports name-only when version is unavailable.requirements.txt
is present and contains direct references. Falls back to name-only if version missing. This detector doesn't parse dependency information formpyproject.toml
pyproject.toml
and versions frompoetry.lock
.uv tree
command which is used in the current implementation.Notes
pip show
anduv tree
are confirmed read-only and safe (no reverse shell injection risk).