|
| 1 | +--- |
| 2 | +title: clermontyping |
| 3 | +description: A Bactopia Tool which uses ClermonTyping to conduct _in silico_ phylotyping of _Escherichia_ genomes. |
| 4 | +--- |
| 5 | +# Bactopia Tool - `clermontyping` |
| 6 | +The `clermontyping` module used [ClermonTyping](https://github.com/happykhan/ClermonTyping) |
| 7 | +to conduct _in silico_ prediction of phylotype for _Escherichia_ genomes. It uses the |
| 8 | +genome assemblies to be assign them to _E. albertii_, _E. fergusonii_, _Escherichia_ |
| 9 | +clades I–V, _E. coli sensu stricto_ as well as to the main _E. coli_ phylogroups |
| 10 | + |
| 11 | + |
| 12 | +## Example Usage |
| 13 | +``` |
| 14 | +bactopia --wf clermontyping \ |
| 15 | + --bactopia /path/to/your/bactopia/results |
| 16 | +``` |
| 17 | + |
| 18 | +## Output Overview |
| 19 | + |
| 20 | +Below is the default output structure for the `clermontyping` tool. Where possible the |
| 21 | +file descriptions below were modified from a tools description. |
| 22 | + |
| 23 | +```{bash} |
| 24 | +<BACTOPIA_DIR> |
| 25 | +├── <SAMPLE_NAME> |
| 26 | +│ └── tools |
| 27 | +│ └── clermontyping |
| 28 | +│ ├── <SAMPLE_NAME>.blast.xml |
| 29 | +│ ├── <SAMPLE_NAME>.html |
| 30 | +│ ├── <SAMPLE_NAME>.mash.tsv |
| 31 | +│ ├── <SAMPLE_NAME>.phylogroups.txt |
| 32 | +│ └── logs |
| 33 | +│ ├── nf-clermontyping.{begin,err,log,out,run,sh,trace} |
| 34 | +│ └── versions.yml |
| 35 | +└── bactopia-runs |
| 36 | + └── clermontyping |
| 37 | + ├── merged-results |
| 38 | + │ ├── clermontyping.tsv |
| 39 | + │ └── logs |
| 40 | + │ └── clermontyping-concat |
| 41 | + │ ├── nf-merged-results.{begin,err,log,out,run,sh,trace} |
| 42 | + │ └── versions.yml |
| 43 | + └── nf-reports |
| 44 | + ├── clermontyping-dag.dot |
| 45 | + ├── clermontyping-report.html |
| 46 | + ├── clermontyping-timeline.html |
| 47 | + └── clermontyping-trace.txt |
| 48 | +
|
| 49 | +``` |
| 50 | + |
| 51 | +!!! info "Directory structure might be different" |
| 52 | + |
| 53 | + `clermontyping` is available as a standalone Bactopia Tool, as well as from |
| 54 | + the main Bactopia workflow (e.g. through Staphopia or Merlin). If executed |
| 55 | + from Bactopia, the `clermontyping` directory structure might be different, but the |
| 56 | + output descriptions below still apply. |
| 57 | + |
| 58 | + |
| 59 | + |
| 60 | +### Results |
| 61 | + |
| 62 | +#### Merged Results |
| 63 | + |
| 64 | +Below are results that are concatenated into a single file. |
| 65 | + |
| 66 | + |
| 67 | +| Filename | Description | |
| 68 | +|-------------------------------|-------------| |
| 69 | +| clermontyping.csv | A merged TSV file with `ClermonTyping` results from all samples | |
| 70 | + |
| 71 | + |
| 72 | +#### ClermonTyping |
| 73 | + |
| 74 | +Below is a description of the _per-sample_ results from [ClermonTyping](https://github.com/happykhan/ClermonTyping). |
| 75 | + |
| 76 | + |
| 77 | +| Extension | Description | |
| 78 | +|-------------------------------|-------------| |
| 79 | +| <SAMPLE_NAME>.blast.xml | A BLAST XML file with the results of the ClermonTyping analysis | |
| 80 | +| <SAMPLE_NAME>.html | A HTML file with the results of the ClermonTyping analysis | |
| 81 | +| <SAMPLE_NAME>.mash.tsv | A TSV file with the Mash distances | |
| 82 | +| <SAMPLE_NAME>.phylogroups.txt | A TSV file with the final phylogroup assignments | |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | +### Audit Trail |
| 89 | + |
| 90 | +Below are files that can assist you in understanding which parameters and program versions were used. |
| 91 | + |
| 92 | +#### Logs |
| 93 | + |
| 94 | +Each process that is executed will have a folder named `logs`. In this folder are helpful |
| 95 | +files for you to review if the need ever arises. |
| 96 | + |
| 97 | +| Extension | Description | |
| 98 | +|--------------|-------------| |
| 99 | +| .begin | An empty file used to designate the process started | |
| 100 | +| .err | Contains STDERR outputs from the process | |
| 101 | +| .log | Contains both STDERR and STDOUT outputs from the process | |
| 102 | +| .out | Contains STDOUT outputs from the process | |
| 103 | +| .run | The script Nextflow uses to stage/unstage files and queue processes based on given profile | |
| 104 | +| .sh | The script executed by bash for the process | |
| 105 | +| .trace | The Nextflow [Trace](https://www.nextflow.io/docs/latest/tracing.html#trace-report) report for the process | |
| 106 | +| versions.yml | A YAML formatted file with program versions | |
| 107 | + |
| 108 | +#### Nextflow Reports |
| 109 | + |
| 110 | +These Nextflow reports provide great a great summary of your run. These can be used to optimize |
| 111 | +resource usage and estimate expected costs if using cloud platforms. |
| 112 | + |
| 113 | +| Filename | Description | |
| 114 | +|----------|-------------| |
| 115 | +| clermontyping-dag.dot | The Nextflow [DAG visualisation](https://www.nextflow.io/docs/latest/tracing.html#dag-visualisation) | |
| 116 | +| clermontyping-report.html | The Nextflow [Execution Report](https://www.nextflow.io/docs/latest/tracing.html#execution-report) | |
| 117 | +| clermontyping-timeline.html | The Nextflow [Timeline Report](https://www.nextflow.io/docs/latest/tracing.html#timeline-report) | |
| 118 | +| clermontyping-trace.txt | The Nextflow [Trace](https://www.nextflow.io/docs/latest/tracing.html#trace-report) report | |
| 119 | + |
| 120 | + |
| 121 | +#### Program Versions |
| 122 | + |
| 123 | +At the end of each run, each of the `versions.yml` files are merged into the files below. |
| 124 | + |
| 125 | +| Filename | Description | |
| 126 | +|---------------------------|-------------| |
| 127 | +| software_versions.yml | A complete list of programs and versions used by each process | |
| 128 | +| software_versions_mqc.yml | A complete list of programs and versions formatted for [MultiQC](https://multiqc.info/) | |
| 129 | + |
| 130 | +## Parameters |
| 131 | + |
| 132 | + |
| 133 | +### <i class="fa-xl fas fa-terminal"></i> Required Parameters |
| 134 | +Define where the pipeline should find input data and save output data. |
| 135 | + |
| 136 | +| Parameter | Description | |
| 137 | +|:---|---| |
| 138 | +| <i class="fa-lg fas fa-bacterium"></i>` --bactopia` | The path to bactopia results to use as inputs <br/>**Type:** `string` | |
| 139 | + |
| 140 | +### <i class="fa-xl fa-solid fa-filter"></i> Filtering Parameters |
| 141 | +Use these parameters to specify which samples to include or exclude. |
| 142 | + |
| 143 | +| Parameter | Description | |
| 144 | +|:---|---| |
| 145 | +| <i class="fa-lg far fa-square-plus"></i>` --include` | A text file containing sample names (one per line) to include from the analysis <br/>**Type:** `string` | |
| 146 | +| <i class="fa-lg far fa-square-minus"></i>` --exclude` | A text file containing sample names (one per line) to exclude from the analysis <br/>**Type:** `string` | |
| 147 | + |
| 148 | + |
| 149 | +### <i class="fa-xl fas fa-exclamation-circle"></i> ClermonTyping Parameters |
| 150 | + |
| 151 | + |
| 152 | +| Parameter | Description | |
| 153 | +|:---|---| |
| 154 | +| <i class="fa-lg fas fa-file-alt"></i>` --clermon_threshold` | Do not use contigs under this size <br/>**Type:** `number` | |
| 155 | + |
| 156 | + |
| 157 | +### <i class="fa-xl fa-solid fa-gears"></i> Optional Parameters |
| 158 | +These optional parameters can be useful in certain settings. |
| 159 | + |
| 160 | +| Parameter | Description | |
| 161 | +|:---|---| |
| 162 | +| <i class="fa-lg fas fa-folder"></i>` --outdir` | Base directory to write results to <br/>**Type:** `string`, **Default:** `bactopia` | |
| 163 | +| <i class="fa-lg fas fa-expand-arrows-alt"></i>` --skip_compression` | Ouput files will not be compressed <br/>**Type:** `boolean` | |
| 164 | +| <i class="fa-lg fas fa-folder"></i>` --datasets` | The path to cache datasets to <br/>**Type:** `string` | |
| 165 | +| <i class="fa-lg fas fa-trash-restore"></i>` --keep_all_files` | Keeps all analysis files created <br/>**Type:** `boolean` | |
| 166 | + |
| 167 | +### <i class="fa-xl fa-solid fa-arrow-up-right-dots"></i> Max Job Request Parameters |
| 168 | +Set the top limit for requested resources for any single job. |
| 169 | + |
| 170 | +| Parameter | Description | |
| 171 | +|:---|---| |
| 172 | +| <i class="fa-lg fas fa-redo"></i>` --max_retry` | Maximum times to retry a process before allowing it to fail. <br/>**Type:** `integer`, **Default:** `3` | |
| 173 | +| <i class="fa-lg fas fa-microchip"></i>` --max_cpus` | Maximum number of CPUs that can be requested for any single job. <br/>**Type:** `integer`, **Default:** `4` | |
| 174 | +| <i class="fa-lg fas fa-memory"></i>` --max_memory` | Maximum amount of memory that can be requested for any single job. <br/>**Type:** `string`, **Default:** `128.GB` | |
| 175 | +| <i class="fa-lg far fa-clock"></i>` --max_time` | Maximum amount of time that can be requested for any single job. <br/>**Type:** `string`, **Default:** `240.h` | |
| 176 | +| <i class="fa-lg fas fa-angle-double-up"></i>` --max_downloads` | Maximum number of samples to download at a time <br/>**Type:** `integer`, **Default:** `3` | |
| 177 | + |
| 178 | +### <i class="fa-xl fa-solid fa-screwdriver-wrench"></i> Nextflow Configuration Parameters |
| 179 | +Parameters to fine-tune your Nextflow setup. |
| 180 | + |
| 181 | +| Parameter | Description | |
| 182 | +|:---|---| |
| 183 | +| <i class="fa-lg fas fa-cog"></i>` --nfconfig` | A Nextflow compatible config file for custom profiles, loaded last and will overwrite existing variables if set. <br/>**Type:** `string` | |
| 184 | +| <i class="fa-lg fas fa-copy"></i>` --publish_dir_mode` | Method used to save pipeline results to output directory. <br/>**Type:** `string`, **Default:** `copy` | |
| 185 | +| <i class="fa-lg fas fa-cogs"></i>` --infodir` | Directory to keep pipeline Nextflow logs and reports. <br/>**Type:** `string`, **Default:** `${params.outdir}/pipeline_info` | |
| 186 | +| <i class="fa-lg fas fa-recycle"></i>` --force` | Nextflow will overwrite existing output files. <br/>**Type:** `boolean` | |
| 187 | +| <i class="fa-lg fas fa-trash-alt"></i>` --cleanup_workdir` | After Bactopia is successfully executed, the `work` directory will be deleted. <br/>**Type:** `boolean` | |
| 188 | + |
| 189 | +### <i class="fa-xl fas fa-university"></i> Institutional config options |
| 190 | +Parameters used to describe centralized config profiles. These should not be edited. |
| 191 | + |
| 192 | +| Parameter | Description | |
| 193 | +|:---|---| |
| 194 | +| <i class="fa-lg fas fa-users-cog"></i>` --custom_config_version` | Git commit id for Institutional configs. <br/>**Type:** `string`, **Default:** `master` | |
| 195 | +| <i class="fa-lg fas fa-users-cog"></i>` --custom_config_base` | Base directory for Institutional configs. <br/>**Type:** `string`, **Default:** `https://raw.githubusercontent.com/nf-core/configs/master` | |
| 196 | +| <i class="fa-lg fas fa-users-cog"></i>` --config_profile_name` | Institutional config name. <br/>**Type:** `string` | |
| 197 | +| <i class="fa-lg fas fa-users-cog"></i>` --config_profile_description` | Institutional config description. <br/>**Type:** `string` | |
| 198 | +| <i class="fa-lg fas fa-users-cog"></i>` --config_profile_contact` | Institutional config contact information. <br/>**Type:** `string` | |
| 199 | +| <i class="fa-lg fas fa-users-cog"></i>` --config_profile_url` | Institutional config URL link. <br/>**Type:** `string` | |
| 200 | + |
| 201 | +### <i class="fa-xl fa-regular fa-address-card"></i> Nextflow Profile Parameters |
| 202 | +Parameters to fine-tune your Nextflow setup. |
| 203 | + |
| 204 | +| Parameter | Description | |
| 205 | +|:---|---| |
| 206 | +| <i class="fa-lg fas fa-folder"></i>` --condadir` | Directory to Nextflow should use for Conda environments <br/>**Type:** `string` | |
| 207 | +| <i class="fa-lg fas fa-box"></i>` --registry` | Docker registry to pull containers from. <br/>**Type:** `string`, **Default:** `dockerhub` | |
| 208 | +| <i class="fa-lg fas fa-folder"></i>` --datasets_cache` | Directory where downloaded datasets should be stored. <br/>**Type:** `string`, **Default:** `<BACTOPIA_DIR>/data/datasets` | |
| 209 | +| <i class="fa-lg fas fa-folder"></i>` --singularity_cache_dir` | Directory where remote Singularity images are stored. <br/>**Type:** `string` | |
| 210 | +| <i class="fa-lg fas fa-toolbox"></i>` --singularity_pull_docker_container` | Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead. <br/>**Type:** `boolean` | |
| 211 | +| <i class="fa-lg fas fa-recycle"></i>` --force_rebuild` | Force overwrite of existing pre-built environments. <br/>**Type:** `boolean` | |
| 212 | +| <i class="fa-lg fas fa-clipboard-list"></i>` --queue` | Comma-separated name of the queue(s) to be used by a job scheduler (e.g. AWS Batch or SLURM) <br/>**Type:** `string`, **Default:** `general,high-memory` | |
| 213 | +| <i class="fa-lg fas fa-clipboard-list"></i>` --cluster_opts` | Additional options to pass to the executor. (e.g. SLURM: '--account=my_acct_name' <br/>**Type:** `string` | |
| 214 | +| <i class="fa-lg fas fa-clipboard-list"></i>` --container_opts` | Additional options to pass to Apptainer, Docker, or Singularityu. (e.g. Singularity: '-D `pwd`' <br/>**Type:** `string` | |
| 215 | +| <i class="fa-lg fas fa-toggle-off"></i>` --disable_scratch` | All intermediate files created on worker nodes of will be transferred to the head node. <br/>**Type:** `boolean` | |
| 216 | + |
| 217 | +### <i class="fa-xl fa-solid fa-reply-all"></i> Helpful Parameters |
| 218 | +Uncommonly used parameters that might be useful. |
| 219 | + |
| 220 | +| Parameter | Description | |
| 221 | +|:---|---| |
| 222 | +| <i class="fa-lg fas fa-palette"></i>` --monochrome_logs` | Do not use coloured log outputs. <br/>**Type:** `boolean` | |
| 223 | +| <i class="fa-lg fas fa-remove-format"></i>` --nfdir` | Print directory Nextflow has pulled Bactopia to <br/>**Type:** `boolean` | |
| 224 | +| <i class="fa-lg far fa-clock"></i>` --sleep_time` | The amount of time (seconds) Nextflow will wait after setting up datasets before execution. <br/>**Type:** `integer`, **Default:** `5` | |
| 225 | +| <i class="fa-lg fas fa-tasks"></i>` --validate_params` | Boolean whether to validate parameters against the schema at runtime <br/>**Type:** `boolean`, **Default:** `True` | |
| 226 | +| <i class="fa-lg fas fa-question-circle"></i>` --help` | Display help text. <br/>**Type:** `boolean` | |
| 227 | +| <i class="fa-lg fas fa-bacteria"></i>` --wf` | Specify which workflow or Bactopia Tool to execute <br/>**Type:** `string`, **Default:** `bactopia` | |
| 228 | +| <i class="fa-lg fas fa-list"></i>` --list_wfs` | List the available workflows and Bactopia Tools to use with '--wf' <br/>**Type:** `boolean` | |
| 229 | +| <i class="fa-lg far fa-eye"></i>` --show_hidden_params` | Show all params when using `--help` <br/>**Type:** `boolean` | |
| 230 | +| <i class="fa-lg fas fa-question-circle"></i>` --help_all` | An alias for --help --show_hidden_params <br/>**Type:** `boolean` | |
| 231 | +| <i class="fa-lg fas fa-info"></i>` --version` | Display version text. <br/>**Type:** `boolean` | |
| 232 | + |
| 233 | +## Citations |
| 234 | +If you use Bactopia and `clermontyping` in your analysis, please cite the following. |
| 235 | + |
| 236 | +- [Bactopia](https://bactopia.github.io/) |
| 237 | + Petit III RA, Read TD [Bactopia - a flexible pipeline for complete analysis of bacterial genomes.](https://doi.org/10.1128/mSystems.00190-20) _mSystems_ 5 (2020) |
| 238 | + |
| 239 | + |
| 240 | +- [ClermontTyping](https://github.com/happykhan/ClermonTyping) |
| 241 | + Beghain J, Bridier-Nahmias A, Le Nagard H, Denamur E, Clermont O. [ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping.](https://doi.org/10.1099/mgen.0.000192) Microbial Genomics, 4(7), e000192. (2018) |
| 242 | + |
| 243 | +- [csvtk](https://bioinf.shenwei.me/csvtk/) |
| 244 | + Shen, W [csvtk: A cross-platform, efficient and practical CSV/TSV toolkit in Golang.](https://github.com/shenwei356/csvtk/) (GitHub) |
| 245 | + |
0 commit comments