Skip to content

Conversation

cmeesters
Copy link
Collaborator

@cmeesters cmeesters commented Aug 6, 2025

adding conda specs for reproducible deployments - this is a draft PR: the vinalc is not yet (bio)conda

Summary by CodeRabbit

  • Chores
    • Added dedicated Conda environments for biopython, openbabel, plotting, vinalc, basic (gzip), and simple_pandas to standardize runtimes and dependencies.
  • Refactor
    • Multiple workflow rules updated to use these environments; several file-handling steps converted from shell commands to Python run blocks (e.g., preparation and docking-related steps).

Copy link
Contributor

coderabbitai bot commented Aug 6, 2025

Walkthrough

Adds multiple new Conda environment YAMLs and updates Snakemake rules to reference those envs; several rules replace shell steps with Python run: blocks for file copying and a grid-parameter parsing/writing step.

Changes

Cohort / File(s) Summary of changes
New env: biopython
workflow/envs/biopython.yml
Added Conda env declaring channels: (including conda-forge) and dependencies: biopython>=1.85.
New env: openbabel
workflow/envs/openbabel.yml
Added Conda env declaring channels: conda-forge and dependencies: openbabel>=3.1.1.
New env: simple_pandas
workflow/envs/simple_pandas.yml
Added Conda env with channels: conda-forge, bioconda and dependencies: python>=3.12.4, pandas>=2.2.2.
New envs: plotting, vinalc, basic
workflow/envs/plotting.yml, workflow/envs/vinalc.yml, workflow/envs/basic.yml
Added plotting.yml (matplotlib>=3.10.6, venn>=0.1.3), vinalc.yml (vinalc>=1.4.2 on bioconda), and basic.yml (gzip>=1.14) environment files.
Analyse rule updates
workflow/rules/analyse.smk
Added conda: directives to several rules (e.g., makeHistogram../envs/plotting.yml, dockingResultsTxt../envs/simple_pandas.yml, makeVenn../envs/plotting.yml) and replaced shell: copy in prepareSecondDocking with a run: Python block using shutil.copy to copy grid and receptor files.
Preparation rule updates
workflow/rules/preparation.smk
Added conda: directives to multiple rules (e.g., prepareReceptor../envs/biopython.yml, energyMin/makeReceptorPDBQT../envs/openbabel.yml, gunzip../envs/basic.yml). Converted prepareGeometry from a shell: step to a run: block that reads input lines beginning with npts or gridcenter, extracts fields 2–4, reverses their order, and writes space-separated values with a trailing space and newline to the output.
Docking rule update
workflow/rules/docking.smk
Added conda: "../envs/vinalc.yml" to the docking rule; no other changes to inputs/outputs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant SN as Snakemake
  participant EF as Env YAMLs
  participant CR as Conda Resolver
  participant RL as Rule Runtime
  participant FS as Filesystem

  rect #F8FBFF
    SN->>EF: read referenced envs (`../envs/*.yml`)
    EF-->>SN: return env contents (new env files)
  end

  rect #F6FFF0
    SN->>CR: request env resolution for rule (conda: ...)
    CR-->>SN: resolve env and provide runtime packages
  end

  rect #FFF9F0
    SN->>RL: execute rule
    alt rule uses `run:` (Python)
      RL->>FS: perform Python ops (shutil.copy, parse/write grid params)
      FS-->>RL: I/O results
    else rule uses `shell:`
      RL->>FS: run shell command
      FS-->>RL: I/O results
    end
    RL-->>SN: rule finished
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I hopped through YAMLs neat and spry,
I copied grids with Python sigh.
Envs stacked up in tidy rows,
Rules now run where conda goes.
A carrot cheers each passing CI. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “feat: adding conda specs” succinctly and accurately describes the primary change in the pull request, which is the addition of multiple Conda environment specification files to enable reproducible deployments. It is concise, clear, and aligned with conventional commit prefixes without including extraneous details. Any teammate reviewing history can understand the main intent of the changeset from this title alone.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/add_conda

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cmeesters cmeesters marked this pull request as ready for review October 10, 2025 07:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
workflow/envs/biopython.yml (1)

5-5: Remove trailing whitespace.

Line 5 contains trailing whitespace after python>=3.12.4. This should be removed for cleaner YAML formatting.

workflow/envs/simple_pandas.yml (1)

5-5: Remove trailing whitespace.

Line 5 contains trailing whitespace after python>=3.12.4. This should be removed for cleaner YAML formatting.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7168006 and c3e6317.

📒 Files selected for processing (5)
  • workflow/envs/biopython.yml (1 hunks)
  • workflow/envs/openbabel.yml (1 hunks)
  • workflow/envs/simple_pandas.yml (1 hunks)
  • workflow/rules/analyse.smk (3 hunks)
  • workflow/rules/preparation.smk (5 hunks)
🔇 Additional comments (9)
workflow/rules/analyse.smk (3)

175-176: LGTM! Ensure simple_pandas.yml includes matplotlib.

The conda environment specification for makeHistogram is correctly added. However, ensure that the simple_pandas.yml environment includes matplotlib, as this rule generates PNG histogram output (line 170).

Note: This issue was flagged in the review of workflow/envs/simple_pandas.yml.


219-220: LGTM!

The conda environment specification for dockingResultsTxt is correctly added and appropriate for the CSV output generation.


427-428: LGTM! Verify matplotlib availability.

The conda environment specification for makeVenn is correctly added. Ensure that the simple_pandas.yml environment includes matplotlib for the venn.png output generation (line 423).

Note: This issue was flagged in the review of workflow/envs/simple_pandas.yml.

workflow/rules/preparation.smk (5)

35-36: LGTM!

The conda environment specification for convertMol2 correctly references openbabel.yml, which provides the required openbabel dependency.


69-70: LGTM!

The conda environment specification for SDFToPDBQT correctly references openbabel.yml, which provides the required openbabel dependency.


82-83: LGTM! Verify pandas is needed.

The conda environment specification for prepareReceptor correctly references biopython.yml. However, verify whether the pandas dependency in biopython.yml is actually required by the prepareReceptor.py script.

Note: This verification was requested in the review of workflow/envs/biopython.yml.


95-96: LGTM!

The conda environment specification for makeReceptorPDBQT correctly references openbabel.yml, which provides the required openbabel dependency.


149-150: LGTM!

The conda environment specification for energyMin correctly references openbabel.yml, which provides the required openbabel dependency for energy minimization operations.

workflow/envs/biopython.yml (1)

5-7: Evaluate Python version requirement and pandas dependency

  • Confirm that requiring Python ≥3.12.4 aligns with your target environments (this may exclude users on 3.11.x).
  • prepareReceptor.py does not import pandas; if this YAML is only for that script, remove pandas. Otherwise note that pandas is used by union_venn.py, makeHistogram.py, generateIRODS.py, and ResultTxt.py.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
workflow/rules/preparation.smk (1)

35-175: Remove merge conflict markers from preparation.smk

Several rules (convertMol2, SDFToPDBQT, prepareReceptor, makeReceptorPDBQT, energyMin) still contain <<<<<<<, =======, >>>>>>> markers inside their conda blocks. Snakemake will refuse to parse this file until the conflicts are resolved. Decide on the intended indentation/commas and remove all markers.

workflow/rules/analyse.smk (1)

175-438: Resolve merge conflicts in analyse.smk

The makeHistogram rule still contains <<<<<<</=======/>>>>>>> markers, which makes the Snakefile invalid. Clean up the conflict and choose one conda spec before merging. While you’re in there, verify the makeVenn conda directives—only the last one will take effect.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3e6317 and 21220e9.

📒 Files selected for processing (7)
  • workflow/envs/biopython.yml (1 hunks)
  • workflow/envs/openbabel.yml (1 hunks)
  • workflow/envs/plotting.yml (1 hunks)
  • workflow/envs/vinalc.yml (1 hunks)
  • workflow/rules/analyse.smk (5 hunks)
  • workflow/rules/docking.smk (1 hunks)
  • workflow/rules/preparation.smk (8 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
workflow/envs/openbabel.yml

[error] 4-4: syntax error: could not find expected ':'

(syntax)

workflow/envs/biopython.yml

[error] 5-5: syntax error: could not find expected ':'

(syntax)

Comment on lines 30 to 31
conda:
"../envs/vinalc.yml",
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Broken conda environment reference

This rule now depends on ../envs/vinalc.yml, but that environment cannot resolve because vinalc is not available on (bio)conda (see comment on workflow/envs/vinalc.yml). Until the package exists in a channel you publish, this rule will fail at runtime.

🤖 Prompt for AI Agents
In workflow/rules/docking.smk around lines 30-31, the conda directive points to
"../envs/vinalc.yml" which references a vinalc package not available on conda
channels and will cause runtime failures; update the rule to use a resolvable
environment: either point to an alternative env yaml that only lists packages
available on conda/bioconda (or a local vendored env file), remove the conda
stanza and switch to a supported container (Docker/Singularity) for that rule,
or modify workflow/envs/vinalc.yml to replace the unavailable vinalc dependency
with an installable alternative or build step; ensure the chosen env is
committed to the repo or uses only published channels so the rule can be
resolved at runtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if vinalc package exists on bioconda by fetching the recipe
curl -s "https://api.github.com/repos/bioconda/bioconda-recipes/contents/recipes/vinalc" | jq -r '.[].name' 2>/dev/null || echo "Failed to fetch recipe contents"

# Also check the meta.yaml to see version info
curl -s "https://raw.githubusercontent.com/bioconda/bioconda-recipes/master/recipes/vinalc/meta.yaml" | head -20

Length of output: 507


🌐 Web query:

Is vinalc package available on bioconda channel? What versions are available?

💡 Result:

Yes — vinalc is available on the bioconda channel (listed on Anaconda.org). The package entry shows a recent upload dated 2025-10-01. [1]

I couldn't reliably scrape the package-files page here, so to see all published builds/versions run locally (shows exact versions and platforms):

conda search -c bioconda vinalc

or view the package files on Anaconda.org (bioconda/vinalc). [1][2]

Sources:

  • Anaconda.org bioconda package listing (vinalc entry). [1]
  • Bioconda package index / repo listing. [2]

@cmeesters, you're absolutely right, and I apologize for the incorrect assessment! 🙏

The evidence confirms that vinalc is indeed available on bioconda:

  • ✅ Recipe exists in bioconda-recipes with version 1.4.2
  • ✅ Package is published on Anaconda.org under the bioconda channel (uploaded 2025-10-01)
  • ✅ The version specified in your workflow/envs/vinalc.yml (>=1.4.2) matches the available version

The conda environment reference in this rule is valid and will resolve correctly. The note in the PR description about vinalc not being available was likely outdated or the package became available since the PR was created.

Would you like me to resolve my original comment since the concern I raised is no longer valid?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
workflow/rules/docking.smk (1)

30-31: Broken conda environment (already flagged)

This conda directive was already flagged in a previous review—vinalc is not available on (bio)conda, so this rule will fail at runtime until the package is published or an alternative approach is used.

workflow/rules/analyse.smk (1)

321-322: Broken conda environment (already flagged)

This conda directive was already flagged in a previous review—vinalc is not available on (bio)conda, so this rule will fail at runtime until the package is published or an alternative approach is used.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21220e9 and 8df159a.

📒 Files selected for processing (3)
  • workflow/rules/analyse.smk (5 hunks)
  • workflow/rules/docking.smk (1 hunks)
  • workflow/rules/preparation.smk (9 hunks)
🔇 Additional comments (7)
workflow/rules/analyse.smk (4)

175-176: LGTM!

The conda environment for plotting is correctly referenced.


219-220: LGTM!

The conda environment for simple pandas operations is correctly referenced.


284-288: LGTM!

Replacing the shell-based file copy with a Python run block using shutil.copy improves portability and maintainability.


429-430: LGTM!

The conda environment for plotting is correctly referenced, consistent with the makeHistogram rule.

workflow/rules/preparation.smk (3)

110-111: LGTM!

The conda environment for basic utilities is correctly referenced.


166-183: Verify the trailing space in output

The Python run block correctly parses grid parameters and reverses their order. However, Line 183 writes " ".join(grid_params) + " \n" which includes a trailing space before the newline. Confirm whether this trailing space is required by downstream tools or if it should be just "\n".


200-202: LGTM!

Replacing the shell-based approach with a Python run block using shutil.copy improves portability and maintainability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
workflow/rules/preparation.smk (1)

166-183: Remove unnecessary trailing space and handle empty grid parameters

  • Change f.write(" ".join(grid_params) + " \n") to f.write(" ".join(grid_params) + "\n") to avoid the extra space.
  • Optionally, add a check to raise an error if grid_params is empty, preventing a whitespace‐only output file.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8df159a and 8c5081a.

📒 Files selected for processing (1)
  • workflow/rules/preparation.smk (9 hunks)
🔇 Additional comments (1)
workflow/rules/preparation.smk (1)

110-111: LGTM!

The conda directive correctly references the basic environment for the gunzip operation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
workflow/rules/preparation.smk (1)

200-203: Critical behavior change: file copy vs. path reference.

The original shell command echo {input} > {output} wrote the file path to the output, creating a reference file. The new implementation uses shutil.copy() to copy the file contents, fundamentally changing the rule's behavior.

Comparing with the similar prepareLibrary rule (Lines 186-192), which still uses echo {input} > {output} to write paths, suggests the pattern is to create reference files with paths, not copies.

If the intent is to create a path reference (likely correct), revert to the original behavior. If file copying is intentional, verify all downstream consumers of this output expect file contents rather than a path reference.

Apply this diff to restore the original path-writing behavior:

-    run:
-        import shutil
-
-        shutil.copy(input[0], output[0])
+    shell:
+        "echo {input} > {output}"
🧹 Nitpick comments (1)
workflow/rules/preparation.smk (1)

166-183: Minor issues in Python conversion.

The Python implementation has two minor issues:

  1. Misplaced comment (Line 178): The comment # Reverse the order (equivalent to 'tac') is indented inside the with block but applies to the grid_params.reverse() call at Line 179, which is outside the block. Move the comment to Line 179 for clarity.

  2. Unusual trailing space (Line 183): The output format " \n" includes a trailing space before the newline, which is unusual. Verify that this matches the expected output format from the original shell implementation. If the trailing space is unintentional, use "\n" instead.

Apply this diff to address these issues:

-                # Reverse the order (equivalent to 'tac')
 grid_params.reverse()
+# Reverse the order (equivalent to 'tac')
 
 # Write to output file with space separation and trailing newline
 with open(output[0], "w") as f:
-    f.write(" ".join(grid_params) + " \n")
+    f.write(" ".join(grid_params) + "\n")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c5081a and 88d4316.

📒 Files selected for processing (1)
  • workflow/rules/preparation.smk (9 hunks)
🔇 Additional comments (1)
workflow/rules/preparation.smk (1)

110-111: No issues with basic.yml Verified that workflow/envs/basic.yml exists and contains no merge conflict markers.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88d4316 and 4cc0753.

📒 Files selected for processing (1)
  • workflow/rules/preparation.smk (8 hunks)
🔇 Additional comments (1)
workflow/rules/preparation.smk (1)

166-183: Verify the trailing space in the output format.

The implementation correctly extracts grid parameters and reverses their order. However, line 183 writes the output with a trailing space before the newline:

f.write(" ".join(grid_params) + " \n")

This trailing space might be intentional for downstream parsing, but it's unusual and could be a typo.

Verify that downstream consumers of {receptorID}_grid.txt expect this format, or remove the trailing space if unintended.

Apply this diff if the trailing space is unintended:

-            f.write(" ".join(grid_params) + " \n")
+            f.write(" ".join(grid_params) + "\n")

Otherwise, consider adding a comment explaining why the trailing space is required:

-        # Write to output file with space separation and trailing newline
+        # Write to output file with space separation, trailing space (required by downstream parser), and newline

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
workflow/envs/biopython.yml (1)

1-4: Pin Python for reproducibility

The PR aims at reproducible deployments, but this env leaves Python implicit, so solver drift over time can alter the environment. Add an explicit Python requirement to stabilize builds.

 dependencies:
+  - python>=3.12
   - biopython>=1.85
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cc0753 and 3d2954e.

📒 Files selected for processing (1)
  • workflow/envs/biopython.yml (1 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d2954e and 0a96d7c.

📒 Files selected for processing (2)
  • workflow/envs/basic.yml (1 hunks)
  • workflow/rules/preparation.smk (8 hunks)
🔇 Additional comments (2)
workflow/rules/preparation.smk (2)

35-36: Conda environment directives look good.

The per-rule conda environment references are correctly specified and align with the new environment files added in this PR. Based on previous review confirmations, the referenced environment files (openbabel.yml, biopython.yml) have had their merge conflicts resolved and are now valid.

Also applies to: 48-49, 71-72, 84-85, 97-98, 153-154


166-183: Python migration looks correct.

The migration from shell to Python run block correctly implements the grid parameter extraction logic:

  • Filters lines starting with npts or gridcenter
  • Extracts the 2nd-4th space-separated fields
  • Reverses the order (equivalent to tac)
  • Writes space-separated output

The trailing space before the newline in line 183 appears intentional to match the original behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
workflow/rules/preparation.smk (1)

166-183: LGTM! Clean migration to Python.

The Python implementation correctly replicates the shell-based logic:

  • Filters lines starting with "npts" or "gridcenter"
  • Safely extracts fields 2-4 with boundary checks
  • Reverses the parameter order (equivalent to tac)
  • Writes space-separated output

If you'd like to enhance maintainability, consider adding a brief docstring or inline comments explaining the grid parameter extraction and reversal logic:

    run:
        # Extract grid parameters (npts and gridcenter) from AutoDock GPF file
        # Fields 2-4 of matching lines are collected, reversed, and output space-separated
        grid_params = []
        
        with open(input[0], "r") as f:
            for line in f:
                if line.startswith(("npts", "gridcenter")):
                    parts = line.strip().split()
                    if len(parts) >= 4:
                        grid_params.append(" ".join(parts[1:4]))
        
        # Reverse parameter order
        grid_params.reverse()
        
        with open(output[0], "w") as f:
            f.write(" ".join(grid_params) + " \n")

Note: The trailing space in the output appears intentional for downstream compatibility.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8322fe0 and 60f464b.

📒 Files selected for processing (2)
  • workflow/envs/basic.yml (1 hunks)
  • workflow/rules/preparation.smk (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/envs/basic.yml
🔇 Additional comments (1)
workflow/rules/preparation.smk (1)

110-111: basic.yml is properly populated and valid with gzip>=1.14.

@cmeesters cmeesters merged commit 5de1a87 into main Oct 10, 2025
5 checks passed
@cmeesters cmeesters deleted the feat/add_conda branch October 10, 2025 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant