Files
obitools4/autodoc/docmd/pkg/obialign/pairedendalign.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

38 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Semantic Description of `obialign` Package
The `obialign` package provides high-performance, memory-efficient tools for **pairwise alignment of paired-end biological sequences**, optimized specifically for Next-Generation Sequencing (NGS) data.
## Core Functionalities
### 1. **Memory Arena Management**
- `PEAlignArena` is a reusable memory buffer to avoid repeated allocations during multiple alignments.
- Preallocates matrices (`scoreMatrix`, `pathMatrix`), alignment buffers, and auxiliary structures based on expected max sequence lengths.
### 2. **Dynamic Programming Alignment Functions**
Implements three specialized global alignment variants using NeedlemanWunsch with affine gap penalties (scaled per mismatch):
- **`PELeftAlign`**: Free gaps at the *start* of `seqB` and end of `seqA`. Ideal for aligning overlapping reads where the first read starts before or within the second.
- **`PERightAlign`**: Free gaps at start of `seqA` and end of `seqB`. Suited when the second read extends beyond the first.
- **`PECenterAlign`**: Free gaps at both ends of *both* sequences; requires `seqA ≥ seqB`. Designed for full overlap scenarios (e.g., merging paired-end reads).
All use column-major matrix storage and efficient index arithmetic via helper functions `_GetMatrix`, `_SetMatrices`, etc.
### 3. **Scoring & Quality Integration**
- Pairwise base/quality scores computed by `_PairingScorePeAlign`, combining:
- Nucleotide compatibility (via precomputed `_NucPartMatch`)
- Phred quality scores (`_NucScorePartMatchMatch`, `_NucScorePartMatchMismatch`)
- A user-defined `scale` factor to modulate mismatch penalties.
### 4. **Fast Heuristic Pre-Alignment**
The main `PEAlign` function integrates a kmer-based fast pre-screening:
- Uses 4-mer indexing (`obikmer.Index4mer`) and shift estimation via `FastShiftFourMer`.
- If overlap is significant (`fastCount + 3 < over`), performs localized DP only on the predicted overlapping region (using `PELeftAlign` or `PERightAlign`) to save time.
- Otherwise, computes full alignment over entire sequences (both left and right variants), selecting the best score.
### 5. **Backtracking & Path Output**
- `_Backtracking` reconstructs the optimal alignment path from `pathMatrix`.
- Paths encoded as alternating `(offset, length)` pairs for aligned segments (diagonal = 0), with gaps encoded as `-1`/`+1`.
### Use Case
Designed for **paired-end read merging**, overlap detection, and consensus building in metagenomic pipelines (e.g., OBITOOLS4 ecosystem). Efficient, scalable for large batch processing via arena reuse.