- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
4.7 KiB
ObiTax: Semantic Overview of Public Functionalities
obitax is a Go package for managing hierarchical taxonomic data in biodiversity pipelines. It provides thread-safe, iterator-based APIs to query, filter, and traverse taxonomies—while supporting robust defaults, string interning, type-safe identifiers (Taxid), and phylogenetic interoperability.
✅ Default Taxonomy Management
.SetAsDefault(): Registers aTaxonomyinstance as the global default..OrDefault(panicOnNil bool): Substitutesnilreceivers with the default taxonomy (panics if none exists andpanicOnNil=true)..HasDefaultTaxonomyDefined()/.OrDefault(): Enables safe fallback without boilerplate.
🔍 Core Filtering Operations (Iterator-Centric)
All filters return *ITaxon, enabling lazy, composable pipelines.
-
.IFilterOnName(name string, strict bool, ignoreCase bool)
Filters taxa by name: exact match (strict=true) or regex (default). Case-insensitive ifignoreCase. Deduplicates via internal node ID. -
.IFilterOnTaxRank(rank string)
Filters taxa whose rank matches (normalized via taxonomy’s internalized ranks map). Supports chaining and concurrent iteration. -
.IFilterOnSubcladeOf(parent *Taxon)
Yields descendants ofparent(via.IsSubCladeOf()). Works on iterators, sets, slices, and taxonomies. -
.IFilterBelongingSubclades(clades *TaxonSet)
Filters taxa belonging to any clade inclades. Optimized for single-clade case (reuses.IFilterOnSubcladeOf).
🌳 Hierarchical Navigation & Relationship Queries
.IsSubCladeOf(parent *Taxon) bool: Checks if current taxon descends fromparent..IsBelongingSubclades(clades *TaxonSet) bool: Checks if current taxon—or any ancestor—is inclades..IPath() *ITaxon: Iterates upward from taxon to root (breadth-first via.IPath())..TaxonAtRank(rank string)/ shortcuts (e.g.,.Species(),.Genus()): Traverse ancestors to find first match at given rank.
🧠 String Interning & Deduplication
InnerString.Innerize(value string) *string: Thread-safe deduplication of strings (e.g., names, ranks). Returns shared pointer for equality checks..Slice() []string: Snapshot of all interned strings (read-only).
🔢 Taxonomic Identifiers (Taxid)
FromInt(int)/FromString(string) *string: Validates and normalizes IDs (e.g.,"tx:12345"→ interned"12345"). Enforces code prefix, filters to ASCII digits/letters.
📜 Taxon String Parsing
ParseTaxonString(taxonStr string): Parses"code:taxid [name]@rank"into structured components. Validates brackets, colons, and field presence.
🧬 Taxonomy & Node Model
-
Taxon: Encapsulates node ID, parent/children links, scientific name (and alternatives), rank, and metadata..Name(class),.ScientificName(): Flexible name access (case-insensitive matching viaIsNameEqual/regex)..SetMetadata(key, value),.GetMetadata(key)/ iteration: Extensible annotations..String(): Human-readable"code:id [name]@rank"format.
-
Taxonomy: Manages full hierarchy:.AddTaxon()/.InsertPathString(): Build trees incrementally..Root()/.SetRoot()/.HasRoot(): Root node control (required for LCA)..AsPhyloTree()→obiphylo.PhyloNode: Export to phylogenetic format.
-
TaxonSet: Efficient set of*TaxNodes with alias support:.Alias(id, taxon): Non-canonical ID mapping..Sort()→ topologically sorted slice (parents before children)..AsPhyloTree(root).
-
TaxonSlice: Ordered, type-safe path representation:.String()→"id@name@rank|..."(leaf-to-root).- Enforces taxonomy coherence; panics on mismatch.
🧮 Lowest Common Ancestor (LCA)
.LCA(t2 *Taxon) (*Taxon, error): Computes most specific shared ancestor of two taxa in same rooted taxonomy. Uses path-based backward traversal.
🔄 Iterator Composition & Utilities (ITaxon)
.Next(),.Get()/.Finished(): Standard iteration control..Push(taxon),.Close(), andSplit() / Concat(...): Goroutine-driven streaming, parallel consumption..ISubTaxonomy()/.ITaxon(taxid): Breadth-first subtree traversal from root or given ID..AddMetadata(name, value): Wraps iterator to inject metadata into each taxon..Consume(): Exhausts an iterator (e.g., for side-effect-only pipelines).
🛡️ Safety & Robustness
- Nil-safe accessors (no panics unless explicitly configured).
- Explicit error messages for invalid inputs, cross-taxonomy queries, or unrooted hierarchies.
- Interning reduces memory footprint and accelerates equality checks.
Designed for scalability in large-scale metabarcoding, biodiversity informatics, and phylogenetic pipelines.