- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
3.5 KiB
obipairing Package — Functional Overview
The obipairing package enables robust merging of paired-end next-generation sequencing (NGS) reads within the OBITools4 ecosystem. It bridges input parsing, alignment configuration, and consensus assembly—supporting both high-accuracy overlap-based merging and lightweight fallback concatenation when overlaps are unreliable.
Public API Summary
CLI Interface (obipairing/cli.go)
- Input specification:
--forward-reads(-F) and--reverse-reads(-R) flags accept FASTQ/FASTA file paths. - Alignment tuning:
_Delta(--delta, default5) — buffer for refining initial overlap detection._MinOverlap(--min-overlap, default20) — minimum overlap length._MinIdentity(--min-identity, default90) — minimum % identity for valid alignment._GapPenalty(--gap-penalty, default2) — gap cost multiplier vs mismatches._PenaltyScale(--scale, default1) — global scoring scaling factor.
- Alignment mode control:
- Fast heuristic enabled by default;
--exact-modedisables it. - Absolute scoring in fast mode via
--fast-absolute.
- Fast heuristic enabled by default;
- Output customization:
--without-statomits alignment statistics from consensus headers. - Extends generic I/O options inherited from
obiconvertfor pipeline compatibility.
Core Assembly Functions (obipairing/assemble.go)
-
JoinPairedSequence(seqA, seqB *obiseq.BioSequence, inplace bool) (consensus *obiseq.BioSequence)
Concatenates forward and reverse reads with a..........(10-dot) separator.- Quality scores for dots set to Phred
Q=0if both inputs are quality-tracked. - Supports in-place recycling (
inplace=true) to reduce allocations.
- Quality scores for dots set to Phred
-
AssemblePESequences(...)
Performs high-fidelity paired-end assembly:- Uses
obialign.PEAlignwith a two-stage process:- Fast heuristic (
FAST) to locate candidate overlap region. - Dynamic programming refinement, extended by
_Delta.
- Fast heuristic (
- Validates alignment against thresholds (
minOverlap,minIdentity).
Falls back to join if criteria unmet. - Optionally annotates output with alignment metadata:
"mode" → "alignment" or "join" "ali_length" → overlap length "score_norm" → normalized alignment score "identity" → % identity over overlap "directionality"→ orientation (e.g., FR) - Supports in-place reuse (
inplace) and absolute/relative scoring viafastModeRel.
- Uses
-
IAssemblePESequencesBatch(...)
Parallelizes assembly over batches of read pairs:- Consumes iterators from
PairWith(e.g., viaobiiter). - Launches configurable workers (
nworkers) and channel buffer size. - Internally reverse-complements the second read before alignment (
seqB.ReverseComplement()). - Yields assembled consensus sequences via an iterator.
- Consumes iterators from
Configuration & Parameter Access
- Getter functions (
CLI*) expose parsed CLI parameters (e.g.,CLIMinOverlap(),CLIGapPenalty()), enabling downstream alignment modules to reuse CLI-defined settings.
Annotation Semantics
Each assembled sequence carries annotations describing the assembly mode and, when applicable:
- Alignment scores (
ali_score,score_norm) - Overlap metrics (
ali_length,identity) - Fast-mode metadata (e.g.,
"pairing_fast_score") when heuristic alignment is used.
Designed for scalability, low memory footprint, and integration with obiseq, obiiter, and alignment backends in OBITools4.