Files
obitools4/autodoc/docmd/pkg/obialign/pairedendalign.md
T

38 lines
2.5 KiB
Markdown
Raw Normal View History

2026-04-07 08:36:50 +02:00
# Semantic Description of `obialign` Package
The `obialign` package provides high-performance, memory-efficient tools for **pairwise alignment of paired-end biological sequences**, optimized specifically for Next-Generation Sequencing (NGS) data.
## Core Functionalities
### 1. **Memory Arena Management**
- `PEAlignArena` is a reusable memory buffer to avoid repeated allocations during multiple alignments.
- Preallocates matrices (`scoreMatrix`, `pathMatrix`), alignment buffers, and auxiliary structures based on expected max sequence lengths.
### 2. **Dynamic Programming Alignment Functions**
Implements three specialized global alignment variants using NeedlemanWunsch with affine gap penalties (scaled per mismatch):
- **`PELeftAlign`**: Free gaps at the *start* of `seqB` and end of `seqA`. Ideal for aligning overlapping reads where the first read starts before or within the second.
- **`PERightAlign`**: Free gaps at start of `seqA` and end of `seqB`. Suited when the second read extends beyond the first.
- **`PECenterAlign`**: Free gaps at both ends of *both* sequences; requires `seqA ≥ seqB`. Designed for full overlap scenarios (e.g., merging paired-end reads).
All use column-major matrix storage and efficient index arithmetic via helper functions `_GetMatrix`, `_SetMatrices`, etc.
### 3. **Scoring & Quality Integration**
- Pairwise base/quality scores computed by `_PairingScorePeAlign`, combining:
- Nucleotide compatibility (via precomputed `_NucPartMatch`)
- Phred quality scores (`_NucScorePartMatchMatch`, `_NucScorePartMatchMismatch`)
- A user-defined `scale` factor to modulate mismatch penalties.
### 4. **Fast Heuristic Pre-Alignment**
The main `PEAlign` function integrates a kmer-based fast pre-screening:
- Uses 4-mer indexing (`obikmer.Index4mer`) and shift estimation via `FastShiftFourMer`.
- If overlap is significant (`fastCount + 3 < over`), performs localized DP only on the predicted overlapping region (using `PELeftAlign` or `PERightAlign`) to save time.
- Otherwise, computes full alignment over entire sequences (both left and right variants), selecting the best score.
### 5. **Backtracking & Path Output**
- `_Backtracking` reconstructs the optimal alignment path from `pathMatrix`.
- Paths encoded as alternating `(offset, length)` pairs for aligned segments (diagonal = 0), with gaps encoded as `-1`/`+1`.
### Use Case
Designed for **paired-end read merging**, overlap detection, and consensus building in metagenomic pipelines (e.g., OBITOOLS4 ecosystem). Efficient, scalable for large batch processing via arena reuse.