mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.7 KiB
2.7 KiB
Semantic Description of obialign.ReadAlign
The ReadAlign function performs paired-end read alignment with quality-aware scoring, optimized for overlapping consensus construction in NGS data processing.
Core Functionality
- Input: Two biological sequences (
seqA,seqB) asBioSequenceobjects, plus alignment parameters:gap: gap penalty (linear)scale: scaling factor for quality scoresdelta: extension buffer around initial overlap estimatefastScoreRel: use relative vs absolute k-mer matching score
Algorithm Overview
-
Preprocessing & Initialization
- Ensures DNA scoring matrix is initialized (
_InitDNAScoreMatrix).
- Ensures DNA scoring matrix is initialized (
-
Fast Overlap Estimation via 4-mer Indexing
- Builds a k-mer index of
seqAusingobikmer.Index4mer. - Computes optimal shift via
_FastShiftFourMerin both forward and reverse-complement orientations. - Selects orientation (direct or reversed) yielding highest k-mer match count (
fastCount) and score (fastScore).
- Builds a k-mer index of
-
Overlap Computation
- Determines overlap length
overbased on shift:over = |seqA| - shift if shift > 0 |seqB| + shift if shift < 0 min(|seqA|,|seqB)| otherwise
- Determines overlap length
-
Dynamic Programming Alignment
- If overlap is not identical (
fastCount + 3 < over):- Extracts subregions with
delta-buffered boundaries. - Calls either
_FillMatrixPeLeftAlign(left-aligned case) or_FillMatrixPERightAlign. - Backtracks via
_Backtrackingto produce alignment path.
- Extracts subregions with
- Else (near-perfect overlap):
- Skips DP; computes score directly from quality scores using
_NucScorePartMatchMatch. - Returns trivial path
[extra5, partLen].
- Skips DP; computes score directly from quality scores using
- If overlap is not identical (
Output
Returns:
| Index | Type | Meaning |
|---|---|---|
| 0️⃣ | int |
Final alignment score (weighted by quality) |
| 1️⃣ | []int |
Alignment path (list of positions: [startA, endA, startB, endB] or similar) |
| 2️⃣ | int |
K-mer match count (fastCount) |
| 3️⃣ | int |
Overlap length (over) |
| 4️⃣ | float64 |
K-mer-based score (fastScore) |
| 5️⃣ | bool |
Whether alignment was performed in direct orientation (true) or on reverse-complement of seqB |
Key Design Highlights
- Efficient pre-filtering using 4-mers avoids full DP for nearly identical reads.
- Quality-aware scoring, leveraging Phred scores via
_NucScorePartMatchMatch. - Supports asymmetric overlaps (left/right alignment) with boundary padding (
delta). - Uses preallocated memory arenas to minimize GC pressure in high-throughput pipelines.