Skip to content

Mxrcon/Bioinfo-python-scripts

Repository files navigation

Bioinfo Tools

A professional suite of bioinformatics utilities for sequence analysis and manipulation.

Developed and maintained by Davi Marcon
Email: [email protected]

About

Bioinfo Tools is a command-line application that provides a unified interface for common bioinformatics tasks. It was developed to help researchers and collaborators work more efficiently with biological sequence data.

The tool is designed with a modular architecture, making it easy to add new functionality and suitable for deployment via package managers like Bioconda and Pixi.

Features

  • Extract CDS: Filter coding sequences from GenBank files based on gene lists
  • Extract Proteins: Extract amino acid sequences from GenBank annotations
  • Extract Genes: Extract nucleotide sequences of genes from GenBank files
  • BLAST: Run BLAST searches with multiple queries and databases with automatic database formatting
  • Convert AB1: Convert AB1 sequencing files to FASTQ format
  • Process BLAST Results: Create hit matrices from BLAST output files
  • Rename FASTA: Rename FASTA files based on header information
  • Compare Proteins: Compare two protein sequences and identify mutations
  • Download PDB: Download PDB structure files with metadata
  • Generate PGAP Files: Create .pep, .nuc, and .function files for NCBI PGAP
  • Extract rRNA: Extract 16S, 23S, and 5S ribosomal RNA from GenBank files

Installation

From PyPI (when available)

pip install bioinfo-tools

From Source

git clone https://github.com/Mxrcon/Bioinfo-python-scripts.git
cd Bioinfo-python-scripts
pip install .

For Development

git clone https://github.com/Mxrcon/Bioinfo-python-scripts.git
cd Bioinfo-python-scripts
pip install -e .

Requirements

  • Python >= 3.6
  • Biopython >= 1.70

Usage

The tool provides a unified command-line interface with multiple subcommands:

bioinfo-tools <command> [options]

Available Commands

Extract CDS

Filter CDS features from GenBank files:

bioinfo-tools extract-cds -i genbank_folder/ -g genes.txt -o output_folder/

Extract Proteins

Extract amino acid sequences:

bioinfo-tools extract-proteins -i genbank_folder/ -g genes.txt -o proteins_output/

Extract Genes

Extract nucleotide sequences:

bioinfo-tools extract-genes -i genbank_folder/ -g genes.txt -o genes_output/

BLAST

Run BLAST searches:

bioinfo-tools blast -q queries/ -d databases/ -t nucl -b blastn -e 1e-5

Convert AB1

Convert sequencing files:

bioinfo-tools convert-ab1 -i sequence.ab1 -o output.fastq

Process BLAST Results

Create hit matrix from BLAST results:

bioinfo-tools process-blast-results -i blast_results/ -o matrix.txt

Rename FASTA

Rename FASTA files based on headers:

bioinfo-tools rename-fasta -i sequence.fasta

Compare Proteins

Identify mutations between protein sequences:

bioinfo-tools compare-proteins -o MKTAYIA -m MKTARIA

Download PDB

Download PDB structure files:

bioinfo-tools download-pdb -i 1A00 -o structures/

Generate PGAP Files

Create files for NCBI PGAP:

bioinfo-tools generate-pgap-files -i genbank/ -o pgap_output/

Extract rRNA

Extract ribosomal RNA sequences:

bioinfo-tools extract-rrna -i genbank/ -o rrna_output/

Getting Help

Get general help:

bioinfo-tools --help

Get help for a specific command:

bioinfo-tools extract-cds --help
bioinfo-tools blast --help

Input Files

Gene List Format

A plain text file with one gene name per line:

dnaA
rpoB
recA
gyrA

GenBank Files

Standard GenBank format files (.gbk, .gb, .genbank) with CDS annotations.

FASTA Files

Standard FASTA format for queries and databases in BLAST searches.

Output

  • Extract CDS: Filtered GenBank files
  • Extract Proteins: FASTA files organized by gene name
  • Extract Genes: FASTA files organized by gene name
  • BLAST: Tabular BLAST results (default format 6)

Contributing

Contributions are welcome! If you'd like to discuss changes or report issues, please:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this tool in your research, please cite:

Marcon, D.J. (2024). Bioinfo Tools: A suite of bioinformatics utilities.
GitHub: https://github.com/Mxrcon/Bioinfo-python-scripts

About

Python tools for bioinformatics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages