- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
3.8 KiB
obiconsensus Package: Semantic Overview
The obiconsensus package delivers scalable, graph-based consensus and denoising tools for high-throughput biological sequence data within the OBITools4 ecosystem. It enables error correction, variant clustering, and consensus reconstruction from related amplicon or metagenomic reads—supporting both single-sample and multi-sample workflows.
Public API Summary
Core Algorithms & Utilities
-
BuildConsensus():
Constructs a consensus sequence via de Bruijn graph assembly of input reads. Automatically selects optimalk-mer size (fallback: longest common suffix analysis). Detects graph cycles and incrementally increaseskuntil resolved. Optionally persists intermediate graphs (*.gml) and FASTA inputs. Output includes metadata: consensus flag, total read weight (summed abundances),k-mer size used, and graph statistics. -
SampleWeight():
Returns a closure that retrieves per-sequence sample abundances (e.g., read counts) from sequence annotations or statistics—enabling weighted graph operations. -
SeqBySamples():
Groups sequences by sample identifier, using a configurable annotation key (default:"sample"). Supports grouping based on either statistical attributes (StatsOn) or sequence metadata. -
BuildDiffSeqGraph():
Builds a difference graph where nodes represent unique sequences and edges encode single-nucleotide mutations (position + substitution). Usesobialign.D1Or0for exact alignment or approximate LCS-based distance scaling. Supports parallel edge computation and optional progress bar. -
MinionDenoise():
Denoises sequences by identifying high-degree nodes (potential consensus hubs), building local consensuses viaBuildConsensus(), and preserving low-degree nodes unchanged. Propagates sample annotations, weights, and metadata. -
MinionClusterDenoise():
Denoises via weight-based clustering: aggregates node weights (self + neighbors), selects local maxima as cluster heads, and builds consensus per neighborhood. -
CLIOBIMinion():
CLI orchestrator for end-to-end denoising: loads sequences, groups by sample (--sample), builds per-sample difference graphs (optional export via--save-graph), applies denoising (MinionDenoise()orMinionClusterDenoise()), optionally deduplicates output (--unique), and annotates sequence lengths.
Configuration & CLI Helpers
- Clustering Mode:
--cluster(-C) enables graph-based clustering. - Distance Threshold:
--distance(-d, default: 1) sets max Hamming distance for edge inclusion. - K-mer Control:
--kmer-size(SIZE, default: -1 = auto-selected). - Sample Key:
--sample(-s, default:"sample") defines the annotation field for sample grouping. - Filtering Options:
--no-singleton: excludes unique sequences.--low-coverage(default: 0) filters low-abundance sequences.
- Output Options:
--unique(-U) enables deduplication (viaobiuniq).--save-graph DIRexports graphs in GraphML.--save-ratio FILEwrites edge abundance ratios as CSV.
- Format Integration: Works with
obiconvertvia unified input/output option sets (InputOptionSet,OutputOptionSet) for FASTA/FASTQ handling. - Getter Functions: Typed accessors (e.g.,
CLIDistStepMax(),CLIKmerSize()) decouple argument parsing from core logic.
Design Principles
- Parallelism: Leverages goroutines and
sync.WaitGroupfor scalable graph construction. - Robustness: Handles edge cases (e.g., single-sequence inputs) gracefully with logging.
- Extensibility: Modular architecture allows swapping alignment engines or graph representations.
Purpose: Accurate, reproducible consensus and denoising for NGS amplicon/metagenomic data at scale.