Files
obitools4/autodoc/docmd/pkg_obitax.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

77 lines
4.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ObiTax: Semantic Overview of Public Functionalities
`obitax` is a Go package for managing hierarchical taxonomic data in biodiversity pipelines. It provides thread-safe, iterator-based APIs to query, filter, and traverse taxonomies—while supporting robust defaults, string interning, type-safe identifiers (`Taxid`), and phylogenetic interoperability.
## ✅ Default Taxonomy Management
- **`.SetAsDefault()`**: Registers a `Taxonomy` instance as the global default.
- **`.OrDefault(panicOnNil bool)`**: Substitutes `nil` receivers with the default taxonomy (panics if none exists and `panicOnNil=true`).
- **`.HasDefaultTaxonomyDefined()` / `.OrDefault()`**: Enables safe fallback without boilerplate.
## 🔍 Core Filtering Operations (Iterator-Centric)
All filters return `*ITaxon`, enabling lazy, composable pipelines.
- **`.IFilterOnName(name string, strict bool, ignoreCase bool)`**
Filters taxa by name: exact match (`strict=true`) or regex (default). Case-insensitive if `ignoreCase`. Deduplicates via internal node ID.
- **`.IFilterOnTaxRank(rank string)`**
Filters taxa whose rank matches (normalized via taxonomys internalized ranks map). Supports chaining and concurrent iteration.
- **`.IFilterOnSubcladeOf(parent *Taxon)`**
Yields descendants of `parent` (via `.IsSubCladeOf()`). Works on iterators, sets, slices, and taxonomies.
- **`.IFilterBelongingSubclades(clades *TaxonSet)`**
Filters taxa belonging to any clade in `clades`. Optimized for single-clade case (reuses `.IFilterOnSubcladeOf`).
## 🌳 Hierarchical Navigation & Relationship Queries
- **`.IsSubCladeOf(parent *Taxon) bool`**: Checks if current taxon descends from `parent`.
- **`.IsBelongingSubclades(clades *TaxonSet) bool`**: Checks if current taxon—or any ancestor—is in `clades`.
- **`.IPath() *ITaxon`**: Iterates upward from taxon to root (breadth-first via `.IPath()`).
- **`.TaxonAtRank(rank string)` / shortcuts (e.g., `.Species()`, `.Genus()`)**: Traverse ancestors to find first match at given rank.
## 🧠 String Interning & Deduplication
- **`InnerString.Innerize(value string) *string`**: Thread-safe deduplication of strings (e.g., names, ranks). Returns shared pointer for equality checks.
- **`.Slice() []string`**: Snapshot of all interned strings (read-only).
## 🔢 Taxonomic Identifiers (`Taxid`)
- **`FromInt(int)` / `FromString(string) *string`**: Validates and normalizes IDs (e.g., `"tx:12345"` → interned `"12345"`). Enforces code prefix, filters to ASCII digits/letters.
## 📜 Taxon String Parsing
- **`ParseTaxonString(taxonStr string)`**: Parses `"code:taxid [name]@rank"` into structured components. Validates brackets, colons, and field presence.
## 🧬 Taxonomy & Node Model
- **`Taxon`**: Encapsulates node ID, parent/children links, scientific name (and alternatives), rank, and metadata.
- `.Name(class)`, `.ScientificName()`: Flexible name access (case-insensitive matching via `IsNameEqual`/regex).
- `.SetMetadata(key, value)`, `.GetMetadata(key)` / iteration: Extensible annotations.
- `.String()`: Human-readable `"code:id [name]@rank"` format.
- **`Taxonomy`**: Manages full hierarchy:
- `.AddTaxon()` / `.InsertPathString()`: Build trees incrementally.
- `.Root()`/`.SetRoot()` / `.HasRoot()`: Root node control (required for LCA).
- `.AsPhyloTree()``obiphylo.PhyloNode`: Export to phylogenetic format.
- **`TaxonSet`**: Efficient set of `*TaxNode`s with alias support:
- `.Alias(id, taxon)`: Non-canonical ID mapping.
- `.Sort()` → topologically sorted slice (parents before children).
- `.AsPhyloTree(root)`.
- **`TaxonSlice`**: Ordered, type-safe path representation:
- `.String()``"id@name@rank|..."` (leaf-to-root).
- Enforces taxonomy coherence; panics on mismatch.
## 🧮 Lowest Common Ancestor (LCA)
- **`.LCA(t2 *Taxon) (*Taxon, error)`**: Computes most specific shared ancestor of two taxa in same rooted taxonomy. Uses path-based backward traversal.
## 🔄 Iterator Composition & Utilities (`ITaxon`)
- **`.Next()`, `.Get()` / `.Finished()`**: Standard iteration control.
- **`.Push(taxon)`, `.Close()`**, and **`Split() / Concat(...)`**: Goroutine-driven streaming, parallel consumption.
- **`.ISubTaxonomy()` / `.ITaxon(taxid)`**: Breadth-first subtree traversal from root or given ID.
- **`.AddMetadata(name, value)`**: Wraps iterator to inject metadata into each taxon.
- **`.Consume()`**: Exhausts an iterator (e.g., for side-effect-only pipelines).
## 🛡️ Safety & Robustness
- Nil-safe accessors (no panics unless explicitly configured).
- Explicit error messages for invalid inputs, cross-taxonomy queries, or unrooted hierarchies.
- Interning reduces memory footprint and accelerates equality checks.
> Designed for scalability in large-scale metabarcoding, biodiversity informatics, and phylogenetic pipelines.