mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
54 lines
2.9 KiB
Markdown
54 lines
2.9 KiB
Markdown
# `obidemerge` Package Documentation
|
|
|
|
The **`obidemerge`** package enables *demerging* of biological sequences—i.e., splitting aggregated or merged sequence records into discrete, count-annotated variants based on metadata statistics. It supports both programmatic and CLI workflows for downstream processing in metabarcoding or amplicon-based pipelines.
|
|
|
|
## Core Functionalities
|
|
|
|
### 1. `MakeDemergeWorker(key string) SeqWorker`
|
|
- **Purpose**: Constructs a sequence processor that splits sequences by statistical metadata.
|
|
- **Behavior**:
|
|
- Scans the input sequence for a statistics map under attribute `key`.
|
|
- *Example*: If `"sample"` → `{ "S1": 5, "S2": 3 }`, two new sequences are generated.
|
|
- For each `(stat_key, count)` pair:
|
|
- Copies the original sequence data,
|
|
- Adds a new attribute: `key = stat_key`,
|
|
- Sets `.Count` to the corresponding integer value.
|
|
- Removes original statistics from the input sequence after splitting.
|
|
- **Fallback**: If no stats are found for `key`, returns a single-element slice containing the unchanged sequence.
|
|
|
|
### 2. `CLIDemergeSequences(iterator, slot string) SeqIterator`
|
|
- **Purpose**: CLI wrapper for batch demerging.
|
|
- **Behavior**:
|
|
- Applies `MakeDemergeWorker(slot)` to each sequence in the input iterator.
|
|
- Supports parallel processing (implementation-dependent).
|
|
- **Integration**:
|
|
- Designed to be used with the `--demerge` CLI flag (see below).
|
|
|
|
### 3. CLI Integration via OptionSet
|
|
- **Flag**: `--demerge` (`-d`)
|
|
- Specifies the metadata slot to demerge (default: `"sample"`).
|
|
- **APIs**:
|
|
- `DemergeOptionSet(options *getoptions.Options)`: Registers the `-d/--demerge` flag.
|
|
- `CLIDemergeSlot() string`: Returns the selected slot name (e.g., `"sample"`), used by downstream workers.
|
|
- **Inheritance**:
|
|
- Extends `obiconvert.OptionSet`, inheriting standard conversion options (I/O formats, filters, etc.).
|
|
|
|
## Semantic Workflow
|
|
|
|
1. **Input**: Sequences with embedded statistical metadata (e.g., sample abundances, OTU counts).
|
|
2. **Demerge Operation**: Splits each sequence into multiple copies—each tagged with a unique metadata key and abundance.
|
|
3. **Output**: A new set of sequences where each variant is independently annotated, enabling:
|
|
- Accurate abundance-aware filtering,
|
|
- Per-variant downstream analysis (e.g., taxonomic assignment, diversity metrics).
|
|
|
|
## Key Concept: *Demerging*
|
|
- **Definition**: Reversal of prior merging steps (e.g., OTU clustering, read pairing).
|
|
- **Purpose**: Restores granularity for statistical or ecological interpretation while preserving original sequence data.
|
|
|
|
## Use Cases
|
|
- Post-clustering demerging of OTU/ASV tables.
|
|
- Splitting merged paired-end reads by sample or condition metadata.
|
|
- Preparing data for tools expecting discrete, count-labeled sequences.
|
|
|
|
> **Note**: Only *public* APIs are documented. Internal helpers (e.g., slot validation, worker state) remain unspecified.
|