mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.5 KiB
2.5 KiB
Semantic Description of obialign Package
The obialign package provides high-performance, memory-efficient tools for pairwise alignment of paired-end biological sequences, optimized specifically for Next-Generation Sequencing (NGS) data.
Core Functionalities
1. Memory Arena Management
PEAlignArenais a reusable memory buffer to avoid repeated allocations during multiple alignments.- Preallocates matrices (
scoreMatrix,pathMatrix), alignment buffers, and auxiliary structures based on expected max sequence lengths.
2. Dynamic Programming Alignment Functions
Implements three specialized global alignment variants using Needleman–Wunsch with affine gap penalties (scaled per mismatch):
PELeftAlign: Free gaps at the start ofseqBand end ofseqA. Ideal for aligning overlapping reads where the first read starts before or within the second.PERightAlign: Free gaps at start ofseqAand end ofseqB. Suited when the second read extends beyond the first.PECenterAlign: Free gaps at both ends of both sequences; requiresseqA ≥ seqB. Designed for full overlap scenarios (e.g., merging paired-end reads).
All use column-major matrix storage and efficient index arithmetic via helper functions _GetMatrix, _SetMatrices, etc.
3. Scoring & Quality Integration
- Pairwise base/quality scores computed by
_PairingScorePeAlign, combining:- Nucleotide compatibility (via precomputed
_NucPartMatch) - Phred quality scores (
_NucScorePartMatchMatch,_NucScorePartMatchMismatch) - A user-defined
scalefactor to modulate mismatch penalties.
- Nucleotide compatibility (via precomputed
4. Fast Heuristic Pre-Alignment
The main PEAlign function integrates a kmer-based fast pre-screening:
- Uses 4-mer indexing (
obikmer.Index4mer) and shift estimation viaFastShiftFourMer. - If overlap is significant (
fastCount + 3 < over), performs localized DP only on the predicted overlapping region (usingPELeftAlignorPERightAlign) to save time. - Otherwise, computes full alignment over entire sequences (both left and right variants), selecting the best score.
5. Backtracking & Path Output
_Backtrackingreconstructs the optimal alignment path frompathMatrix.- Paths encoded as alternating
(offset, length)pairs for aligned segments (diagonal = 0), with gaps encoded as-1/+1.
Use Case
Designed for paired-end read merging, overlap detection, and consensus building in metagenomic pipelines (e.g., OBITOOLS4 ecosystem). Efficient, scalable for large batch processing via arena reuse.