mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.1 KiB
2.1 KiB
Semantic Description of obiseq Statistics and Merging Features
This package provides infrastructure for tracking, aggregating, and merging statistical occurrences of sequence attributes across biological sequences (BioSequence). It supports both count-based and weighted statistics, with thread-safe operations.
Core Components
StatsOnValues: A concurrent map (map[string]int) with R/W locking to store occurrence counts per attribute value (e.g., taxon, primer, quality bin).StatsOnDescription: Defines how to extract and weight statistics from a sequence (e.g., count per read, or sum of quality scores).StatsOnSlotName(key): Generates internal annotation keys (e.g.,"merged_taxon") to store precomputed statistics.
Key Functionalities
-
Per-Sequence Statistics Initialization & Update
StatsOn(desc, na): Ensures a statistics slot exists for attributedesc.Key, initializes if needed.StatsPlusOne(...): Adds contribution of a single sequence to the statistics (e.g., increment count for its taxon).
-
Thread-Safe Aggregation
Merge(*StatsOnValues): Safely merges counts from anotherStatsOnValues, used to combine per-sequence stats.
-
Sequence Merging with Stat Propagation
BioSequence.Merge(...):- Combines two sequences (e.g., consensus/overlap).
- Updates statistics for specified attributes (
statsOn), preserving or aggregating counts. - Resolves conflicting annotations by deleting non-merged fields if mismatched.
-
Bulk Merging
BioSequenceSlice.Merge(...): Efficiently merges N sequences into one, recycling inputs and updating statistics incrementally.
Use Cases
- Tracking taxonomic assignments across merged reads.
- Aggregating primer or barcode counts in amplicon merging.
- Summarizing quality scores, abundance weights, or custom metadata during consensus building.
Design Notes
- Uses
sync.RWMutexfor safe concurrent access. - Supports only JSON-marshalable, serializable statistics (via
MarshalJSON). - Enforces type safety: only strings/integers/booleans allowed for attribute values.