mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
# Semantic Description of `obialign.ReadAlign`
|
||||
|
||||
The `ReadAlign` function performs **paired-end read alignment** with quality-aware scoring, optimized for overlapping consensus construction in NGS data processing.
|
||||
|
||||
## Core Functionality
|
||||
|
||||
- **Input**: Two biological sequences (`seqA`, `seqB`) as `BioSequence` objects, plus alignment parameters:
|
||||
- `gap`: gap penalty (linear)
|
||||
- `scale`: scaling factor for quality scores
|
||||
- `delta`: extension buffer around initial overlap estimate
|
||||
- `fastScoreRel`: use relative vs absolute k-mer matching score
|
||||
|
||||
## Algorithm Overview
|
||||
|
||||
1. **Preprocessing & Initialization**
|
||||
- Ensures DNA scoring matrix is initialized (`_InitDNAScoreMatrix`).
|
||||
|
||||
2. **Fast Overlap Estimation via 4-mer Indexing**
|
||||
- Builds a k-mer index of `seqA` using `obikmer.Index4mer`.
|
||||
- Computes optimal shift via `_FastShiftFourMer` in both forward and reverse-complement orientations.
|
||||
- Selects orientation (direct or reversed) yielding highest k-mer match count (`fastCount`) and score (`fastScore`).
|
||||
|
||||
3. **Overlap Computation**
|
||||
- Determines overlap length `over` based on shift:
|
||||
```text
|
||||
over = |seqA| - shift if shift > 0
|
||||
|seqB| + shift if shift < 0
|
||||
min(|seqA|,|seqB)| otherwise
|
||||
```
|
||||
|
||||
4. **Dynamic Programming Alignment**
|
||||
- If overlap is *not* identical (`fastCount + 3 < over`):
|
||||
- Extracts subregions with `delta`-buffered boundaries.
|
||||
- Calls either `_FillMatrixPeLeftAlign` (left-aligned case) or `_FillMatrixPERightAlign`.
|
||||
- Backtracks via `_Backtracking` to produce alignment path.
|
||||
- Else (near-perfect overlap):
|
||||
- Skips DP; computes score directly from quality scores using `_NucScorePartMatchMatch`.
|
||||
- Returns trivial path `[extra5, partLen]`.
|
||||
|
||||
## Output
|
||||
|
||||
Returns:
|
||||
|
||||
| Index | Type | Meaning |
|
||||
|-------|----------|---------|
|
||||
| 0️⃣ | `int` | Final alignment score (weighted by quality) |
|
||||
| 1️⃣ | `[]int` | Alignment path (list of positions: `[startA, endA, startB, endB]` or similar) |
|
||||
| 2️⃣ | `int` | K-mer match count (`fastCount`) |
|
||||
| 3️⃣ | `int` | Overlap length (`over`) |
|
||||
| 4️⃣ | `float64` | K-mer-based score (`fastScore`) |
|
||||
| 5️⃣ | `bool` | Whether alignment was performed in direct orientation (`true`) or on reverse-complement of `seqB` |
|
||||
|
||||
## Key Design Highlights
|
||||
|
||||
- **Efficient pre-filtering** using 4-mers avoids full DP for nearly identical reads.
|
||||
- **Quality-aware scoring**, leveraging Phred scores via `_NucScorePartMatchMatch`.
|
||||
- Supports **asymmetric overlaps** (left/right alignment) with boundary padding (`delta`).
|
||||
- Uses preallocated memory arenas to minimize GC pressure in high-throughput pipelines.
|
||||
Reference in New Issue
Block a user