- FastQ PE read merging: bfqmerge
- FastQ trimming for single-end reads: bfqtrimse
- FastQ statistics: bfqstats
Requires gcc/clang and GNU Make, tested with macOS and Linux. bfqutils is written in C and comes bundled with the Cloudflare fork of Zlib.
git clone https://github.com/noborilab/bfqutils
cd bfqutils
make libz
make release
make test # optional
To dynamically link to a system Zlib instead of building a static library:
make z_dyn=1 release
This tool is meant to be used as a drop-in replacement for fastp --merge
. The only differences from fastp are a smaller default minimum merge length (since this tool is primarily intended for libraries with small inserts) and more aggressive base correction. Apart from that, bfqmerge is intended to be a lightweight replacement with extremely low memory usage (<2 MB) and very good single-thread performance (<15 min on 2x100M PE150 reads, ~10 min without the -z
flag). Remember to change the default values in src/bfqmerge.c
if that would better suit your primary use case.
bfqmerge v1.0 Copyright (C) 2025 Benjamin Jean-Marie Tremblay
Usage: bfqmerge [options] R1.fq[.gz] R2.fq[.gz] > merged.fq
-o <int> Required overlap for a merge to occur. Default: 15
-d <int> Maximum number of mismatches between alignments. Default: 5
-p <dbl> Maximum fraction of mismatches between alignments. Default: 0.2
-Q <int> Minimum PHRED+33 quality to consider a base high quality. Default: 15
-u <dbl> Maximum fraction of bases allowed to be low quality. Default: 0.4
-n <int> Maximum number of Ns allowed. Default: 5
-g <int> Number of Gs to trigger polyG tail trimming. Default: 10
-t <int> Mean window quality threshold for trimming 3-prime bases. Default: 20.
-w <int> Window size of 3-prime base trimming. Default: 5
-m <int> Max merged read length.
-z Compress the output as gzip.
-q Make the program quiet.
-v Print the version and exit.
-h Print this help message and exit.
bfqmerge is a simple program, which performs the following operations:
-
Detect and trim low quality bases from the 3' end of the reads (
-t
,-w
). This helps reduce the possibility of false positive alignments between incorrect read segments, especially when expecting short inserts. -
Detect and trim polyG tails (
-g
). Failure to remove these can lead to bfqmerge thinking that sufficiently long stretches of Gs found in both reads of a pair are the overlapping parts of the reads. Due to the way some machines work, sequencing past the end of a read can lead to long stretches of Gs with high quality scores (which won't be trimmed in step 1). -
Find the best overlap, depending on user settings (
-o
,-d
,-p
). -
If no overlap is found, discard the reads. Otherwise, create a new merged read. For each overlapping position, the base and quality score is taken from whichever of the two reads has a better score in that position.
-
Check if the merged read passes quality filters (
-Q
,-u
,-n
). If yes, then compress the output if desired (-z
), then write tostdout
.
With the exception of -o
, all default values are identical to fastp. These generally work quite well, though be aware that it is impossible to get read merging right 100% of the time. False positive merging events and false negative read discards will almost always occur for any combination of settings given a sufficiently diverse set of reads.
A simple FastQ trimming and quality filtering tool for single-end reads. The order of operations is similar to bfqmerge, replacing overlap analysis with adapter sequence matching. Use '-' for stdin.
bfqtrimse v1.0 Copyright (C) 2025 Benjamin Jean-Marie Tremblay
Usage: bfqtrimse [options] reads.fq[.gz] > trimmed.fq
-a <str> Adapter sequence. Default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
-Q <int> Minimum PHRED+33 quality to consider a base high quality. Default: 15
-u <dbl> Maximum fraction of bases allowed to be low quality. Default: 0.4
-n <int> Maximum number of Ns allowed. Default: 5
-g <int> Number of Gs to trigger polyG tail trimming. Default: 10
-t <int> Mean window quality threshold for trimming 3-prime bases. Default: 20.
-w <int> Window size of 3-prime base trimming. Default: 5
-M <int> Min trimmed read length. Default: 15
-m <int> Max trimmed read length.
-z Compress the output as gzip.
-q Make the program quiet.
-v Print the version and exit.
-h Print this help message and exit.
Meant to be used in a pipe with bfqmerge or bfqtrimse. The default top enriched K-mer summary can help spot bad trimming/merging. Use '-' for stdin.
bfqstats v1.0 Copyright (C) 2025 Benjamin Jean-Marie Tremblay
Usage: bfqstats [options] reads.fq[.gz]
-l <file> Read Length histogram.
-g <file> Read GC content histogram.
-q <file> Mean read quality histogram.
-Q <file> Per-position mean quality histogram.
-b <file> Per-position base content histogram.
-k <file> K-mer counts and obs/exp ratios.
-o <file> Send summary stats to a file instead of stderr.
-K <int> K-mer size for -k. Default: 6
-n <int> Only examine this number of reads.
-O Send the reads to stdout.
-z If -O, compress as gzip.
-N Do not print summary stats to stderr.
-v Print the version and exit.
-h Print this help message and exit.