Files
obitools4/autodoc/docmd/pkg/obialign/readalign.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

59 lines
2.7 KiB
Markdown

# Semantic Description of `obialign.ReadAlign`
The `ReadAlign` function performs **paired-end read alignment** with quality-aware scoring, optimized for overlapping consensus construction in NGS data processing.
## Core Functionality
- **Input**: Two biological sequences (`seqA`, `seqB`) as `BioSequence` objects, plus alignment parameters:
- `gap`: gap penalty (linear)
- `scale`: scaling factor for quality scores
- `delta`: extension buffer around initial overlap estimate
- `fastScoreRel`: use relative vs absolute k-mer matching score
## Algorithm Overview
1. **Preprocessing & Initialization**
- Ensures DNA scoring matrix is initialized (`_InitDNAScoreMatrix`).
2. **Fast Overlap Estimation via 4-mer Indexing**
- Builds a k-mer index of `seqA` using `obikmer.Index4mer`.
- Computes optimal shift via `_FastShiftFourMer` in both forward and reverse-complement orientations.
- Selects orientation (direct or reversed) yielding highest k-mer match count (`fastCount`) and score (`fastScore`).
3. **Overlap Computation**
- Determines overlap length `over` based on shift:
```text
over = |seqA| - shift if shift > 0
|seqB| + shift if shift < 0
min(|seqA|,|seqB)| otherwise
```
4. **Dynamic Programming Alignment**
- If overlap is *not* identical (`fastCount + 3 < over`):
- Extracts subregions with `delta`-buffered boundaries.
- Calls either `_FillMatrixPeLeftAlign` (left-aligned case) or `_FillMatrixPERightAlign`.
- Backtracks via `_Backtracking` to produce alignment path.
- Else (near-perfect overlap):
- Skips DP; computes score directly from quality scores using `_NucScorePartMatchMatch`.
- Returns trivial path `[extra5, partLen]`.
## Output
Returns:
| Index | Type | Meaning |
|-------|----------|---------|
| 0️⃣ | `int` | Final alignment score (weighted by quality) |
| 1️⃣ | `[]int` | Alignment path (list of positions: `[startA, endA, startB, endB]` or similar) |
| 2️⃣ | `int` | K-mer match count (`fastCount`) |
| 3️⃣ | `int` | Overlap length (`over`) |
| 4️⃣ | `float64` | K-mer-based score (`fastScore`) |
| 5️⃣ | `bool` | Whether alignment was performed in direct orientation (`true`) or on reverse-complement of `seqB` |
## Key Design Highlights
- **Efficient pre-filtering** using 4-mers avoids full DP for nearly identical reads.
- **Quality-aware scoring**, leveraging Phred scores via `_NucScorePartMatchMatch`.
- Supports **asymmetric overlaps** (left/right alignment) with boundary padding (`delta`).
- Uses preallocated memory arenas to minimize GC pressure in high-throughput pipelines.