Files
obitools4/autodoc/docmd/pkg_obitools_obipairing.md
T

63 lines
3.5 KiB
Markdown
Raw Normal View History

2026-04-07 08:36:50 +02:00
# `obipairing` Package — Functional Overview
The `obipairing` package enables robust merging of paired-end next-generation sequencing (NGS) reads within the OBITools4 ecosystem. It bridges input parsing, alignment configuration, and consensus assembly—supporting both high-accuracy overlap-based merging and lightweight fallback concatenation when overlaps are unreliable.
## Public API Summary
### CLI Interface (`obipairing/cli.go`)
- **Input specification**:
`--forward-reads` (`-F`) and `--reverse-reads` (`-R`) flags accept FASTQ/FASTA file paths.
- **Alignment tuning**:
- `_Delta` (`--delta`, default `5`) — buffer for refining initial overlap detection.
- `_MinOverlap` (`--min-overlap`, default `20`) — minimum overlap length.
- `_MinIdentity` (`--min-identity`, default `90`) — minimum % identity for valid alignment.
- `_GapPenalty` (`--gap-penalty`, default `2`) — gap cost multiplier vs mismatches.
- `_PenaltyScale` (`--scale`, default `1`) — global scoring scaling factor.
- **Alignment mode control**:
- Fast heuristic enabled by default; `--exact-mode` disables it.
- Absolute scoring in fast mode via `--fast-absolute`.
- **Output customization**:
`--without-stat` omits alignment statistics from consensus headers.
- Extends generic I/O options inherited from `obiconvert` for pipeline compatibility.
### Core Assembly Functions (`obipairing/assemble.go`)
- **`JoinPairedSequence(seqA, seqB *obiseq.BioSequence, inplace bool) (consensus *obiseq.BioSequence)`**
Concatenates forward and reverse reads with a `..........` (10-dot) separator.
- Quality scores for dots set to Phred `Q=0` if both inputs are quality-tracked.
- Supports in-place recycling (`inplace=true`) to reduce allocations.
- **`AssemblePESequences(...)`**
Performs high-fidelity paired-end assembly:
- Uses `obialign.PEAlign` with a two-stage process:
1. **Fast heuristic** (`FAST`) to locate candidate overlap region.
2. **Dynamic programming refinement**, extended by `_Delta`.
- Validates alignment against thresholds (`minOverlap`, `minIdentity`).
Falls back to join if criteria unmet.
- Optionally annotates output with alignment metadata:
```go
"mode" → "alignment" or "join"
"ali_length" → overlap length
"score_norm" → normalized alignment score
"identity" → % identity over overlap
"directionality"→ orientation (e.g., FR)
```
- Supports in-place reuse (`inplace`) and absolute/relative scoring via `fastModeRel`.
- **`IAssemblePESequencesBatch(...)`**
Parallelizes assembly over batches of read pairs:
- Consumes iterators from `PairWith` (e.g., via `obiiter`).
- Launches configurable workers (`nworkers`) and channel buffer size.
- Internally reverse-complements the second read before alignment (`seqB.ReverseComplement()`).
- Yields assembled consensus sequences via an iterator.
### Configuration & Parameter Access
- Getter functions (`CLI*`) expose parsed CLI parameters (e.g., `CLIMinOverlap()`, `CLIGapPenalty()`), enabling downstream alignment modules to reuse CLI-defined settings.
### Annotation Semantics
Each assembled sequence carries annotations describing the assembly mode and, when applicable:
- Alignment scores (`ali_score`, `score_norm`)
- Overlap metrics (`ali_length`, `identity`)
- Fast-mode metadata (e.g., `"pairing_fast_score"`) when heuristic alignment is used.
Designed for scalability, low memory footprint, and integration with `obiseq`, `obiiter`, and alignment backends in OBITools4.