# `obiconsensus` Package: Semantic Overview

The `obiconsensus` package delivers scalable, graph-based consensus and denoising tools for high-throughput biological sequence data within the OBITools4 ecosystem. It enables error correction, variant clustering, and consensus reconstruction from related amplicon or metagenomic reads—supporting both single-sample and multi-sample workflows.

## Public API Summary

### Core Algorithms & Utilities
- **`BuildConsensus()`**:  
  Constructs a consensus sequence via *de Bruijn graph* assembly of input reads. Automatically selects optimal `k`-mer size (fallback: longest common suffix analysis). Detects graph cycles and incrementally increases `k` until resolved. Optionally persists intermediate graphs (`*.gml`) and FASTA inputs. Output includes metadata: consensus flag, total read weight (summed abundances), `k`-mer size used, and graph statistics.

- **`SampleWeight()`**:  
  Returns a closure that retrieves per-sequence sample abundances (e.g., read counts) from sequence annotations or statistics—enabling weighted graph operations.

- **`SeqBySamples()`**:  
  Groups sequences by sample identifier, using a configurable annotation key (default: `"sample"`). Supports grouping based on either statistical attributes (`StatsOn`) or sequence metadata.

- **`BuildDiffSeqGraph()`**:  
  Builds a *difference graph* where nodes represent unique sequences and edges encode single-nucleotide mutations (position + substitution). Uses `obialign.D1Or0` for exact alignment or approximate LCS-based distance scaling. Supports parallel edge computation and optional progress bar.

- **`MinionDenoise()`**:  
  Denoises sequences by identifying high-degree nodes (potential consensus hubs), building local consensuses via `BuildConsensus()`, and preserving low-degree nodes unchanged. Propagates sample annotations, weights, and metadata.

- **`MinionClusterDenoise()`**:  
  Denoises via *weight-based clustering*: aggregates node weights (self + neighbors), selects local maxima as cluster heads, and builds consensus per neighborhood.

- **`CLIOBIMinion()`**:  
  CLI orchestrator for end-to-end denoising: loads sequences, groups by sample (`--sample`), builds per-sample difference graphs (optional export via `--save-graph`), applies denoising (`MinionDenoise()` or `MinionClusterDenoise()`), optionally deduplicates output (`--unique`), and annotates sequence lengths.

### Configuration & CLI Helpers
- **Clustering Mode**: `--cluster` (`-C`) enables graph-based clustering.
- **Distance Threshold**: `--distance` (`-d`, default: 1) sets max Hamming distance for edge inclusion.
- **K-mer Control**: `--kmer-size` (`SIZE`, default: -1 = auto-selected).
- **Sample Key**: `--sample` (`-s`, default: `"sample"`) defines the annotation field for sample grouping.
- **Filtering Options**:  
  - `--no-singleton`: excludes unique sequences.  
  - `--low-coverage` (default: 0) filters low-abundance sequences.
- **Output Options**:  
  - `--unique` (`-U`) enables deduplication (via `obiuniq`).  
  - `--save-graph DIR` exports graphs in GraphML.  
  - `--save-ratio FILE` writes edge abundance ratios as CSV.
- **Format Integration**: Works with `obiconvert` via unified input/output option sets (`InputOptionSet`, `OutputOptionSet`) for FASTA/FASTQ handling.
- **Getter Functions**: Typed accessors (e.g., `CLIDistStepMax()`, `CLIKmerSize()`) decouple argument parsing from core logic.

## Design Principles
- **Parallelism**: Leverages goroutines and `sync.WaitGroup` for scalable graph construction.
- **Robustness**: Handles edge cases (e.g., single-sequence inputs) gracefully with logging.
- **Extensibility**: Modular architecture allows swapping alignment engines or graph representations.

*Purpose: Accurate, reproducible consensus and denoising for NGS amplicon/metagenomic data at scale.*