mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
3.1 KiB
3.1 KiB
obisummary Package: Semantic Description
The obisumsummary package delivers lightweight, high-performance statistical summarization of biological sequence data processed by OBITools4. It enables rapid profiling of metadata and content-level features across large sequence sets—especially useful post-processing (e.g., after obiclean or merging)—while supporting parallel execution for scalability.
Core Data Model
DataSummarystruct: Central container tracking:- Global metrics: number of reads, unique variants (distinct sequences), and total symbols.
- Presence flags for special annotations:
merged_sample,obiclean_status/weight. - Categorized annotation metadata:
- Scalar attributes (single-value per sequence).
- Map-like tags (
map_tags), where each key maps to counts. - Vector or vector-like attributes (multi-value per sequence).
- Per-sample statistics: variant count, singleton detection, and
obiclean-related flags (e.g., bad reads).
Low-Level Helpers
-
Map aggregation utilities:
sumUpdateIntMap: Accumulates integer values across maps.countUpdateIntMap,plusOne/PlusUpdateIntMap: Increment counters for keys (e.g., attribute or sample names).
-
Add()method: Thread-safe merge of twoDataSummarys—enables parallel accumulation.
Main Processing Logic
-
Update()method: Processes oneBioSequence, updating:- Read count (via
.Count()) and sequence-level metrics. - Variant detection via unique sequences; symbol count (total length).
- Sample-aware logic: detects
merged_sampleor per-sample annotations to populate sample-level stats (e.g., singleton identification). - Annotation classification: routes keys into scalar, map, or vector buckets.
- Read count (via
-
ISummary()function: Parallel summarization engine:- Distributes work across
nprocgoroutines. - Aggregates partial summaries via atomic operations (
Add()). - Returns a structured map with:
{ "count": { "variants", "reads", "total_length" }, "annotations": { "scalar_attributes", "map_attributes", "vector_attributes", "keys": { scalar: {...}, map: {...}, vector: {...} } }, "samples": { "sample_count", "sample_stats": { sample_name: { reads, variants, singletons [, obiclean_bad] } } } }
- Distributes work across
CLI Integration (obisummary subpackage)
-
Option registration:
SummaryOptionSet(): Registers flags for output format (--json-output,--yaml-output) and map attributes to summarize (-map <attr>).OptionSet(): Extends above with input-handling options (e.g., file/iterator sources) fromobiconvert.
-
Runtime introspection:
CLIOutFormat(): Returns"yaml"(default) or"json", with YAML only active if JSON is not requested.CLIHasMapSummary()/CLIMapSummary(): Check and retrieve requested map attributes.
-
Design notes:
- Uses global state (e.g.,
__json_output__,__map_summary__) for compatibility withgo-getoptions. - Scope strictly limited to CLI configuration—no data processing logic resides here.
- Uses global state (e.g.,