mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,53 @@
|
||||
# `obidemerge` Package Documentation
|
||||
|
||||
The **`obidemerge`** package enables *demerging* of biological sequences—i.e., splitting aggregated or merged sequence records into discrete, count-annotated variants based on metadata statistics. It supports both programmatic and CLI workflows for downstream processing in metabarcoding or amplicon-based pipelines.
|
||||
|
||||
## Core Functionalities
|
||||
|
||||
### 1. `MakeDemergeWorker(key string) SeqWorker`
|
||||
- **Purpose**: Constructs a sequence processor that splits sequences by statistical metadata.
|
||||
- **Behavior**:
|
||||
- Scans the input sequence for a statistics map under attribute `key`.
|
||||
- *Example*: If `"sample"` → `{ "S1": 5, "S2": 3 }`, two new sequences are generated.
|
||||
- For each `(stat_key, count)` pair:
|
||||
- Copies the original sequence data,
|
||||
- Adds a new attribute: `key = stat_key`,
|
||||
- Sets `.Count` to the corresponding integer value.
|
||||
- Removes original statistics from the input sequence after splitting.
|
||||
- **Fallback**: If no stats are found for `key`, returns a single-element slice containing the unchanged sequence.
|
||||
|
||||
### 2. `CLIDemergeSequences(iterator, slot string) SeqIterator`
|
||||
- **Purpose**: CLI wrapper for batch demerging.
|
||||
- **Behavior**:
|
||||
- Applies `MakeDemergeWorker(slot)` to each sequence in the input iterator.
|
||||
- Supports parallel processing (implementation-dependent).
|
||||
- **Integration**:
|
||||
- Designed to be used with the `--demerge` CLI flag (see below).
|
||||
|
||||
### 3. CLI Integration via OptionSet
|
||||
- **Flag**: `--demerge` (`-d`)
|
||||
- Specifies the metadata slot to demerge (default: `"sample"`).
|
||||
- **APIs**:
|
||||
- `DemergeOptionSet(options *getoptions.Options)`: Registers the `-d/--demerge` flag.
|
||||
- `CLIDemergeSlot() string`: Returns the selected slot name (e.g., `"sample"`), used by downstream workers.
|
||||
- **Inheritance**:
|
||||
- Extends `obiconvert.OptionSet`, inheriting standard conversion options (I/O formats, filters, etc.).
|
||||
|
||||
## Semantic Workflow
|
||||
|
||||
1. **Input**: Sequences with embedded statistical metadata (e.g., sample abundances, OTU counts).
|
||||
2. **Demerge Operation**: Splits each sequence into multiple copies—each tagged with a unique metadata key and abundance.
|
||||
3. **Output**: A new set of sequences where each variant is independently annotated, enabling:
|
||||
- Accurate abundance-aware filtering,
|
||||
- Per-variant downstream analysis (e.g., taxonomic assignment, diversity metrics).
|
||||
|
||||
## Key Concept: *Demerging*
|
||||
- **Definition**: Reversal of prior merging steps (e.g., OTU clustering, read pairing).
|
||||
- **Purpose**: Restores granularity for statistical or ecological interpretation while preserving original sequence data.
|
||||
|
||||
## Use Cases
|
||||
- Post-clustering demerging of OTU/ASV tables.
|
||||
- Splitting merged paired-end reads by sample or condition metadata.
|
||||
- Preparing data for tools expecting discrete, count-labeled sequences.
|
||||
|
||||
> **Note**: Only *public* APIs are documented. Internal helpers (e.g., slot validation, worker state) remain unspecified.
|
||||
Reference in New Issue
Block a user