Files
obitools4/autodoc/docmd/pkg/obiseq/merge.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

40 lines
2.1 KiB
Markdown

# Semantic Description of `obiseq` Statistics and Merging Features
This package provides infrastructure for **tracking, aggregating, and merging statistical occurrences** of sequence attributes across biological sequences (`BioSequence`). It supports both **count-based and weighted statistics**, with thread-safe operations.
## Core Components
- `StatsOnValues`: A concurrent map (`map[string]int`) with R/W locking to store occurrence counts per attribute value (e.g., taxon, primer, quality bin).
- `StatsOnDescription`: Defines *how* to extract and weight statistics from a sequence (e.g., count per read, or sum of quality scores).
- `StatsOnSlotName(key)`: Generates internal annotation keys (e.g., `"merged_taxon"`) to store precomputed statistics.
## Key Functionalities
1. **Per-Sequence Statistics Initialization & Update**
- `StatsOn(desc, na)`: Ensures a statistics slot exists for attribute `desc.Key`, initializes if needed.
- `StatsPlusOne(...)`: Adds contribution of a *single* sequence to the statistics (e.g., increment count for its taxon).
2. **Thread-Safe Aggregation**
- `Merge(*StatsOnValues)`: Safely merges counts from another `StatsOnValues`, used to combine per-sequence stats.
3. **Sequence Merging with Stat Propagation**
- `BioSequence.Merge(...)`:
- Combines two sequences (e.g., consensus/overlap).
- Updates statistics for specified attributes (`statsOn`), preserving or aggregating counts.
- Resolves conflicting annotations by deleting non-merged fields if mismatched.
4. **Bulk Merging**
- `BioSequenceSlice.Merge(...)`: Efficiently merges *N* sequences into one, recycling inputs and updating statistics incrementally.
## Use Cases
- Tracking taxonomic assignments across merged reads.
- Aggregating primer or barcode counts in amplicon merging.
- Summarizing quality scores, abundance weights, or custom metadata during consensus building.
## Design Notes
- Uses `sync.RWMutex` for safe concurrent access.
- Supports only JSON-marshalable, serializable statistics (via `MarshalJSON`).
- Enforces type safety: only strings/integers/booleans allowed for attribute values.