@@ -68,7 +68,7 @@ \section{Introduction}
6868generating base-level alignment, which in turn inspired us to develop minimap2
6969towards higher accuracy and more practical functionality.
7070
71- Both SMRT and ONT have been applied to sequence spliced mRNAs (RNA-seq). While
71+ Both SMRT and ONT have been applied to the sequencing of spliced mRNAs (RNA-seq). While
7272traditional mRNA aligners work~\citep {Wu:2005vn ,Iwata:2012aa }, they are not
7373optimized for long noisy sequence reads and are tens of times slower than
7474dedicated long-read aligners. When developing minimap2 initially for aligning
@@ -111,8 +111,11 @@ \subsubsection{Chaining}
111111\begin {equation }\label {eq:chain-gap }
112112\beta (j,i)=\gamma _c\big ((y_i-y_j)-(x_i-x_j)\big )
113113\end {equation }
114- In implementation, a gap of length $ l$ costs $ \gamma _c(l)=0.01 \cdot \bar {w}\cdot
115- |l|+0.5 \log _2 |l|$ , where $ \bar {w}$ is the average seed length. For $ m$ anchors, directly computing all $ f(\cdot )$ with
114+ In implementation, a gap of length $ l$ costs
115+ \[
116+ \gamma _c(l)=0.01\cdot \bar {w}\cdot |l|+0.5\log _2|l|
117+ \]
118+ where $ \bar {w}$ is the average seed length. For $ m$ anchors, directly computing all $ f(\cdot )$ with
116119Eq.~(\ref {eq:chain }) takes $ O(m^2 )$ time. Although theoretically faster
117120chaining algorithms exist~\citep {Abouelhoda:2005aa }, they
118121are inapplicable to generic gap cost, complex to implement and usually
@@ -363,12 +366,19 @@ \subsection{Aligning spliced sequences}
363366\subsection {Aligning short paired-end reads }
364367
365368During chainging, minimap2 takes a pair of reads as one read with a gap of
366- unknown length in the middle. It does not break a chain if there is a long
367- reference gap between seeds on different reads. After identifying primary
368- chains (Section~\ref {sec:primary }), we split each fragment chain into two read
369- chains and perform alignment for each read as in Section~\ref {sec:genomic }.
370- Finally, we pair hits of each read end to find consistent paired-end
371- alignments.
369+ unknown length in the middle. It applies a normal gap cost between seeds on the
370+ same read but is a more permissive gap cost between seeds on different reads.
371+ More precisely, the gap cost during chaining is:
372+ \[
373+ \gamma _c(l)=\left \{ \begin {array }{ll}
374+ 0.01\cdot\bar {w}\cdot l+0.5\log _2 l & \mbox {if two seeds on the same read} \\
375+ \min \{ 0.01\cdot\bar {w}\cdot |l|,\log _2|l|\} & \mbox {otherwise}
376+ \end {array }\right .
377+ \]
378+ After identifying primary chains (Section~\ref {sec:primary }), we split each
379+ fragment chain into two read chains and perform alignment for each read as in
380+ Section~\ref {sec:genomic }. Finally, we pair hits of each read end to find
381+ consistent paired-end alignments.
372382
373383\end {methods }
374384
0 commit comments