mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.9 KiB
2.9 KiB
obidemerge Package Documentation
The obidemerge package enables demerging of biological sequences—i.e., splitting aggregated or merged sequence records into discrete, count-annotated variants based on metadata statistics. It supports both programmatic and CLI workflows for downstream processing in metabarcoding or amplicon-based pipelines.
Core Functionalities
1. MakeDemergeWorker(key string) SeqWorker
- Purpose: Constructs a sequence processor that splits sequences by statistical metadata.
- Behavior:
- Scans the input sequence for a statistics map under attribute
key.- Example: If
"sample"→{ "S1": 5, "S2": 3 }, two new sequences are generated.
- Example: If
- For each
(stat_key, count)pair:- Copies the original sequence data,
- Adds a new attribute:
key = stat_key, - Sets
.Countto the corresponding integer value.
- Removes original statistics from the input sequence after splitting.
- Scans the input sequence for a statistics map under attribute
- Fallback: If no stats are found for
key, returns a single-element slice containing the unchanged sequence.
2. CLIDemergeSequences(iterator, slot string) SeqIterator
- Purpose: CLI wrapper for batch demerging.
- Behavior:
- Applies
MakeDemergeWorker(slot)to each sequence in the input iterator. - Supports parallel processing (implementation-dependent).
- Applies
- Integration:
- Designed to be used with the
--demergeCLI flag (see below).
- Designed to be used with the
3. CLI Integration via OptionSet
- Flag:
--demerge(-d)- Specifies the metadata slot to demerge (default:
"sample").
- Specifies the metadata slot to demerge (default:
- APIs:
DemergeOptionSet(options *getoptions.Options): Registers the-d/--demergeflag.CLIDemergeSlot() string: Returns the selected slot name (e.g.,"sample"), used by downstream workers.
- Inheritance:
- Extends
obiconvert.OptionSet, inheriting standard conversion options (I/O formats, filters, etc.).
- Extends
Semantic Workflow
- Input: Sequences with embedded statistical metadata (e.g., sample abundances, OTU counts).
- Demerge Operation: Splits each sequence into multiple copies—each tagged with a unique metadata key and abundance.
- Output: A new set of sequences where each variant is independently annotated, enabling:
- Accurate abundance-aware filtering,
- Per-variant downstream analysis (e.g., taxonomic assignment, diversity metrics).
Key Concept: Demerging
- Definition: Reversal of prior merging steps (e.g., OTU clustering, read pairing).
- Purpose: Restores granularity for statistical or ecological interpretation while preserving original sequence data.
Use Cases
- Post-clustering demerging of OTU/ASV tables.
- Splitting merged paired-end reads by sample or condition metadata.
- Preparing data for tools expecting discrete, count-labeled sequences.
Note
: Only public APIs are documented. Internal helpers (e.g., slot validation, worker state) remain unspecified.