A professional suite of bioinformatics utilities for sequence analysis and manipulation.
Developed and maintained by Davi Marcon
Email: [email protected]
Bioinfo Tools is a command-line application that provides a unified interface for common bioinformatics tasks. It was developed to help researchers and collaborators work more efficiently with biological sequence data.
The tool is designed with a modular architecture, making it easy to add new functionality and suitable for deployment via package managers like Bioconda and Pixi.
- Extract CDS: Filter coding sequences from GenBank files based on gene lists
- Extract Proteins: Extract amino acid sequences from GenBank annotations
- Extract Genes: Extract nucleotide sequences of genes from GenBank files
- BLAST: Run BLAST searches with multiple queries and databases with automatic database formatting
- Convert AB1: Convert AB1 sequencing files to FASTQ format
- Process BLAST Results: Create hit matrices from BLAST output files
- Rename FASTA: Rename FASTA files based on header information
- Compare Proteins: Compare two protein sequences and identify mutations
- Download PDB: Download PDB structure files with metadata
- Generate PGAP Files: Create .pep, .nuc, and .function files for NCBI PGAP
- Extract rRNA: Extract 16S, 23S, and 5S ribosomal RNA from GenBank files
pip install bioinfo-tools
git clone https://github.com/Mxrcon/Bioinfo-python-scripts.git
cd Bioinfo-python-scripts
pip install .
git clone https://github.com/Mxrcon/Bioinfo-python-scripts.git
cd Bioinfo-python-scripts
pip install -e .
- Python >= 3.6
- Biopython >= 1.70
The tool provides a unified command-line interface with multiple subcommands:
bioinfo-tools <command> [options]
Filter CDS features from GenBank files:
bioinfo-tools extract-cds -i genbank_folder/ -g genes.txt -o output_folder/
Extract amino acid sequences:
bioinfo-tools extract-proteins -i genbank_folder/ -g genes.txt -o proteins_output/
Extract nucleotide sequences:
bioinfo-tools extract-genes -i genbank_folder/ -g genes.txt -o genes_output/
Run BLAST searches:
bioinfo-tools blast -q queries/ -d databases/ -t nucl -b blastn -e 1e-5
Convert sequencing files:
bioinfo-tools convert-ab1 -i sequence.ab1 -o output.fastq
Create hit matrix from BLAST results:
bioinfo-tools process-blast-results -i blast_results/ -o matrix.txt
Rename FASTA files based on headers:
bioinfo-tools rename-fasta -i sequence.fasta
Identify mutations between protein sequences:
bioinfo-tools compare-proteins -o MKTAYIA -m MKTARIA
Download PDB structure files:
bioinfo-tools download-pdb -i 1A00 -o structures/
Create files for NCBI PGAP:
bioinfo-tools generate-pgap-files -i genbank/ -o pgap_output/
Extract ribosomal RNA sequences:
bioinfo-tools extract-rrna -i genbank/ -o rrna_output/
Get general help:
bioinfo-tools --help
Get help for a specific command:
bioinfo-tools extract-cds --help
bioinfo-tools blast --help
A plain text file with one gene name per line:
dnaA
rpoB
recA
gyrA
Standard GenBank format files (.gbk, .gb, .genbank) with CDS annotations.
Standard FASTA format for queries and databases in BLAST searches.
- Extract CDS: Filtered GenBank files
- Extract Proteins: FASTA files organized by gene name
- Extract Genes: FASTA files organized by gene name
- BLAST: Tabular BLAST results (default format 6)
Contributions are welcome! If you'd like to discuss changes or report issues, please:
- Open an issue on GitHub
- Send an email to [email protected]
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this tool in your research, please cite:
Marcon, D.J. (2024). Bioinfo Tools: A suite of bioinformatics utilities.
GitHub: https://github.com/Mxrcon/Bioinfo-python-scripts