Skip to content

Commit 1dd221a

Browse files
committed
a bit more on short read mapping
The tech note still needs improvement. Will do that after the release of v2.3.
1 parent c6b6392 commit 1dd221a

File tree

1 file changed

+19
-9
lines changed

1 file changed

+19
-9
lines changed

tex/minimap2.tex

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ \section{Introduction}
6868
generating base-level alignment, which in turn inspired us to develop minimap2
6969
towards higher accuracy and more practical functionality.
7070

71-
Both SMRT and ONT have been applied to sequence spliced mRNAs (RNA-seq). While
71+
Both SMRT and ONT have been applied to the sequencing of spliced mRNAs (RNA-seq). While
7272
traditional mRNA aligners work~\citep{Wu:2005vn,Iwata:2012aa}, they are not
7373
optimized for long noisy sequence reads and are tens of times slower than
7474
dedicated long-read aligners. When developing minimap2 initially for aligning
@@ -111,8 +111,11 @@ \subsubsection{Chaining}
111111
\begin{equation}\label{eq:chain-gap}
112112
\beta(j,i)=\gamma_c\big((y_i-y_j)-(x_i-x_j)\big)
113113
\end{equation}
114-
In implementation, a gap of length $l$ costs $\gamma_c(l)=0.01\cdot \bar{w}\cdot
115-
|l|+0.5\log_2|l|$, where $\bar{w}$ is the average seed length. For $m$ anchors, directly computing all $f(\cdot)$ with
114+
In implementation, a gap of length $l$ costs
115+
\[
116+
\gamma_c(l)=0.01\cdot \bar{w}\cdot|l|+0.5\log_2|l|
117+
\]
118+
where $\bar{w}$ is the average seed length. For $m$ anchors, directly computing all $f(\cdot)$ with
116119
Eq.~(\ref{eq:chain}) takes $O(m^2)$ time. Although theoretically faster
117120
chaining algorithms exist~\citep{Abouelhoda:2005aa}, they
118121
are inapplicable to generic gap cost, complex to implement and usually
@@ -363,12 +366,19 @@ \subsection{Aligning spliced sequences}
363366
\subsection{Aligning short paired-end reads}
364367

365368
During chainging, minimap2 takes a pair of reads as one read with a gap of
366-
unknown length in the middle. It does not break a chain if there is a long
367-
reference gap between seeds on different reads. After identifying primary
368-
chains (Section~\ref{sec:primary}), we split each fragment chain into two read
369-
chains and perform alignment for each read as in Section~\ref{sec:genomic}.
370-
Finally, we pair hits of each read end to find consistent paired-end
371-
alignments.
369+
unknown length in the middle. It applies a normal gap cost between seeds on the
370+
same read but is a more permissive gap cost between seeds on different reads.
371+
More precisely, the gap cost during chaining is:
372+
\[
373+
\gamma_c(l)=\left\{\begin{array}{ll}
374+
0.01\cdot\bar{w}\cdot l+0.5\log_2 l & \mbox{if two seeds on the same read} \\
375+
\min\{0.01\cdot\bar{w}\cdot|l|,\log_2|l|\} & \mbox{otherwise}
376+
\end{array}\right.
377+
\]
378+
After identifying primary chains (Section~\ref{sec:primary}), we split each
379+
fragment chain into two read chains and perform alignment for each read as in
380+
Section~\ref{sec:genomic}. Finally, we pair hits of each read end to find
381+
consistent paired-end alignments.
372382

373383
\end{methods}
374384

0 commit comments

Comments
 (0)