mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,30 @@
|
||||
# ObiTax: Default Taxonomy Management
|
||||
|
||||
This Go package (`obitax`) provides utilities for managing a **default taxonomy instance**, enabling centralized configuration and safe fallback behavior.
|
||||
|
||||
## Core Features
|
||||
|
||||
- ✅ **Singleton-style default taxonomy**: A single global `Taxonomy` instance can be designated as *the* default via `.SetAsDefault()`.
|
||||
|
||||
- ✅ **Thread-safe access**: Uses `sync.Mutex` (implicitly via package-level variable usage) to ensure safe concurrent writes when setting the default.
|
||||
|
||||
- ✅ **Graceful fallback with `.OrDefault()`**:
|
||||
- If a `Taxonomy` receiver is `nil`, the method automatically substitutes it with the default taxonomy.
|
||||
- Supports optional panic on failure (`panicOnNil`) if no default is defined.
|
||||
|
||||
- ✅ **Utility checks**:
|
||||
- `HasDefaultTaxonomyDefined()` → returns whether a default is currently set.
|
||||
- `DefaultTaxonomy()` → retrieves the current global instance (if any).
|
||||
|
||||
## Design Intent
|
||||
|
||||
- Promotes **configuration reuse** and reduces boilerplate in client code.
|
||||
- Supports robustness: avoids nil dereferences by allowing fallback to a globally configured taxonomy.
|
||||
|
||||
## Usage Pattern
|
||||
|
||||
```go
|
||||
tax := NewTaxonomy("my-tax")
|
||||
tax.SetAsDefault() // Now all `nil` receivers will resolve to this instance
|
||||
result := someNilTax.OrDefault(true) // Uses default; panics only if none exists
|
||||
```
|
||||
@@ -0,0 +1,28 @@
|
||||
# Semantic Description of `IFilterOnName` Functionality in the `obitax` Package
|
||||
|
||||
The `IFilterOnName` method enables filtering taxonomic data (`Taxon`) instances by name, supporting both **exact** and **pattern-based matching**, with optional case-insensitive comparison.
|
||||
|
||||
- Two overloaded versions exist:
|
||||
- On `*Taxonomy`: delegates to its iterator.
|
||||
- On `*ITaxon`: performs the actual filtering logic.
|
||||
|
||||
- **Parameters**:
|
||||
- `name` (`string`) – search term or regex pattern.
|
||||
- `strict` (`bool`) — if true, performs exact name equality; otherwise treats `name` as a regex.
|
||||
- `ignoreCase` (`bool`) — when true, performs case-insensitive matching (applies to both modes).
|
||||
|
||||
- **Core behavior**:
|
||||
- Uses a `map` (`sentTaxa`) to avoid duplicate taxa (based on internal node ID).
|
||||
- For `strict = true`: compares names using a dedicated equality method (`IsNameEqual`).
|
||||
- For `strict = false`: compiles and applies a regex pattern (`regexp.MustCompile`) — prepends `(?i)` for case-insensitive matching.
|
||||
- Filtering runs in a **goroutine**, streaming results into a new `ITaxon` iterator.
|
||||
- Source channel is properly closed after iteration.
|
||||
|
||||
- **Return value**: a new `*ITaxon` iterator containing only matching taxa — preserving immutability and enabling chaining.
|
||||
|
||||
- **Use cases**:
|
||||
- Find exact species names (e.g., *Homo sapiens*).
|
||||
- Search using partial or regex patterns (e.g., `^Pan.*` for *Panthera* and related genera).
|
||||
- Case-insensitive lookups (e.g., "homo sapiens", "HOMO SAPIENS").
|
||||
|
||||
The design emphasizes **efficiency**, **correctness** (deduplication), and **flexibility** in taxonomic querying.
|
||||
@@ -0,0 +1,12 @@
|
||||
# Semantic Description of `IFilterOnTaxRank` Functionality in the *obitax* Package
|
||||
|
||||
The `IFilterOnTaxRank` method enables semantic filtering of taxonomic data by rank (e.g., `"species"`, `"genus"`). It is implemented across multiple core types—`ITaxon`, `TaxonSet`, `TaxonSlice`, and `Taxonomy`—providing a unified interface for rank-based selection.
|
||||
|
||||
- **Core behavior**: Returns an `*ITaxon` iterator containing only taxa whose node’s rank matches the input string.
|
||||
- **Rank normalization**: Internally, it resolves the requested `rank` against a taxonomy’s internal rank map via `ptax.ranks.Innerize(rank)`, ensuring consistent mapping and case-insensitive or canonical representation handling.
|
||||
- **Efficiency**: Reuses the resolved rank pointer (`prank`) across consecutive taxa from the same `Taxonomy`, avoiding redundant lookups.
|
||||
- **Concurrency-safe iteration**: Uses a goroutine to stream filtered results into the new iterator’s channel (`newIter.source`), enabling lazy evaluation and memory-efficient processing of large datasets.
|
||||
- **Polymorphic dispatch**: Overloaded methods on `TaxonSet`, `TaxonSlice`, and `Taxonomy` delegate to the base iterator implementation, preserving consistency across input types.
|
||||
- **Non-destructive**: Does not mutate source collections; instead produces a new iterator, supporting functional-style chaining.
|
||||
|
||||
This design supports scalable taxonomic querying in phylogenetic or biodiversity analysis pipelines, where filtering by hierarchical rank is essential.
|
||||
@@ -0,0 +1,31 @@
|
||||
# Semantic Overview of `obitax` Filtering Functionalities
|
||||
|
||||
The `obitax` package provides composable, iterator-based filtering methods for taxonomic data structures. All filters return lazy or buffered iterators (`*ITaxon`) enabling efficient, streaming-style traversal without materializing full collections.
|
||||
|
||||
## Core Filtering Operation: `IFilterOnSubcladeOf`
|
||||
|
||||
- **Purpose**: Filters elements belonging to a specific taxonomic subtree.
|
||||
- **Behavior**:
|
||||
- Accepts a `*Taxon` as reference root.
|
||||
- Yields only taxa for which `IsSubCladeOf(taxon)` returns true (i.e., descendants of the given taxon).
|
||||
- **Overloads**:
|
||||
- On `*ITaxon`, `TaxonSet`, `TaxonSlice`, and `Taxonomy` — all delegate to the iterator variant.
|
||||
- Ensures consistent interface across container types.
|
||||
|
||||
## Composite Filtering: `IFilterBelongingSubclades`
|
||||
|
||||
- **Purpose**: Filters taxa belonging to *any* of a set of specified subclade roots.
|
||||
- **Behavior**:
|
||||
- Accepts `*TaxonSet` of clades (roots).
|
||||
- Uses optimized path for single-clade case: reuses `IFilterOnSubcladeOf`.
|
||||
- For multiple clades, checks via `IsBelongingSubclades(clades)` in a goroutine.
|
||||
- Returns original iterator unchanged if input set is empty.
|
||||
|
||||
## Design Highlights
|
||||
|
||||
- **Iterator-Centric**: All operations are defined on `ITaxon`, promoting chaining and lazy evaluation.
|
||||
- **Concurrency Support**: Filtering uses goroutines with buffered channels (`source`), enabling asynchronous stream processing.
|
||||
- **Type Abstraction**: Unified API across `TaxonSet`, `Slice`, and full `Taxonomy` via delegation.
|
||||
- **Performance Consideration**: Special handling for single-clade case avoids unnecessary iteration overhead.
|
||||
|
||||
These methods enable expressive, scalable taxonomic queries—ideal for phylogenetic analysis or biodiversity data pipelines.
|
||||
@@ -0,0 +1,40 @@
|
||||
# `obitax` Package: String Interning with Thread-Safe Storage
|
||||
|
||||
This Go package (`obitax`) provides a **thread-safe string interner**—a data structure that deduplicates identical strings by storing only one copy per unique value and returning shared references.
|
||||
|
||||
## Core Components
|
||||
|
||||
- **`InnerString` struct**
|
||||
Holds:
|
||||
- `index`: A map from string values to pointers (ensuring identity via pointer equality).
|
||||
- `lock`: An embedded `sync.RWMutex` to guarantee safe concurrent access.
|
||||
|
||||
- **Constructor: `NewInnerString()`**
|
||||
Initializes an empty interner with a preallocated map.
|
||||
|
||||
- **Method: `Innerize(value string) *string`**
|
||||
- Stores a new unique value (after cloning via `strings.Clone`) if absent.
|
||||
- Returns the pointer to either:
|
||||
- The newly interned string, or
|
||||
- An existing one (if already present).
|
||||
- Ensures **no duplicate string data** is stored for equal values.
|
||||
- Fully thread-safe via write lock.
|
||||
|
||||
- **Method: `Slice() []string`**
|
||||
Returns a snapshot of all interned strings as a slice (copying values, not pointers).
|
||||
- Not safe for concurrent writes during iteration.
|
||||
- Suitable for inspection or debugging.
|
||||
|
||||
## Semantic Use Cases
|
||||
|
||||
- **Memory optimization**: Avoid repeated allocation of identical strings (e.g., in parsing, serialization).
|
||||
- **Pointer-based identity checks**: Use `==` on returned pointers to test string equality efficiently.
|
||||
- **Concurrent safety**: Designed for use in multi-goroutine environments (e.g., HTTP servers, pipelines).
|
||||
|
||||
## Design Notes
|
||||
|
||||
- Uses `strings.Clone()` to decouple interned strings from original input lifetimes.
|
||||
- Interning is **append-only**—no removal mechanism provided (implied by semantics of a simple interner).
|
||||
- Returns `*string` to enable fast equality comparisons and reduce memory footprint.
|
||||
|
||||
> **Note**: This is a minimal, efficient interner—ideal for read-heavy or batched deduplication scenarios.
|
||||
@@ -0,0 +1,19 @@
|
||||
# Semantic Description of `obitax` Taxonomic Functions
|
||||
|
||||
The `obitax` package provides two core methods for hierarchical taxon relationship analysis:
|
||||
|
||||
- **`IsSubCladeOf(parent *Taxon) bool`**
|
||||
Determines whether the current taxon is a **descendant** (i.e., subclade) of a given parent taxon.
|
||||
- Ensures both taxa belong to the *same taxonomy*—fails with a fatal log if not.
|
||||
- Traverses upward via `taxon.IPath()` (iterative ancestor path) to check if any node matches the parent’s ID.
|
||||
- Returns `true` iff a match is found, indicating lineage descent.
|
||||
|
||||
- **`IsBelongingSubclades(clades *TaxonSet) bool`**
|
||||
Checks whether the current taxon—or any of its **ancestors**—belongs to a specified set of clades (`TaxonSet`).
|
||||
- Starts by testing direct membership via `clades.Contains(taxon.Node.id)`.
|
||||
- Walks upward through the hierarchy (`taxon = taxon.Parent()`) until either:
|
||||
- A match is found, or
|
||||
- The root is reached.
|
||||
- Final check at the root ensures completeness (e.g., if only root belongs).
|
||||
|
||||
Both functions support **robust phylogenetic queries**, enabling classification validation, filtering by clade membership, and hierarchical consistency checks in taxonomic trees.
|
||||
@@ -0,0 +1,31 @@
|
||||
# Semantic Description of `obitax` Package Functionalities
|
||||
|
||||
The `obitax` package provides a robust iterator-based API for traversing taxonomic data structures in Go. Its core component is the `ITaxon` interface, which implements a lazy, concurrent-safe iterator over taxon instances (`*Taxon`). Key features include:
|
||||
|
||||
- **Iterator Creation**: `ITaxon` can be instantiated via `NewITaxon()` or derived from collections:
|
||||
- `TaxonSet.Iterator()`, `TaxonSlice.Iterator()` (sorted), and `Taxonomy.nodes.Iterator()`
|
||||
- Goroutines feed taxa into a channel, enabling non-blocking iteration.
|
||||
|
||||
- **Control Methods**:
|
||||
- `Next()` advances to the next taxon, returning success/failure.
|
||||
- `Get()` retrieves the current taxon (must follow a successful `Next`).
|
||||
- `Finished()` checks if iteration is complete.
|
||||
|
||||
- **Channel Management**:
|
||||
- `Push(taxon)` sends a taxon into the iterator’s channel.
|
||||
- `Close()` terminates iteration by closing the source channel.
|
||||
|
||||
- **Iterator Composition**:
|
||||
- `Split()`: creates a new iterator sharing the same source and termination status (useful for parallel consumption).
|
||||
- `Concat(...)`: merges multiple iterators sequentially into one.
|
||||
|
||||
- **Metadata Enrichment**:
|
||||
- `AddMetadata(name, value)` wraps the iterator to inject metadata into each taxon via `SetMetadata`.
|
||||
|
||||
- **Subtree Traversal**:
|
||||
- `ISubTaxonomy()` (on `*Taxon` or via `Taxonomy.ITaxon(taxid)`) performs a breadth-first traversal of descendant taxa, starting from the current taxon or given ID. It uses parent-child adjacency logic to expand the subtree incrementally.
|
||||
|
||||
- **Consumption Utility**:
|
||||
- `Consume()` exhausts an iterator without processing (e.g., for side-effect-only pipelines).
|
||||
|
||||
All iterators are designed to be composable, memory-efficient (via channels), and safe for concurrent use. The package integrates with `obiutils` to manage pipeline registration/unregistration during subtree expansion.
|
||||
@@ -0,0 +1,31 @@
|
||||
# Semantic Description of `obitax.LCA()` Functionality
|
||||
|
||||
The `LCA` method computes the **Lowest Common Ancestor (LCA)** of two taxonomic entities (`Taxon` instances) within a shared hierarchical taxonomy.
|
||||
|
||||
- **Input**: A pointer to another `*Taxon` (`t2`) and the receiver taxon (`t1`).
|
||||
- **Output**: A `*Taxon` representing their LCA, or an error detailing why computation failed.
|
||||
|
||||
### Core Logic
|
||||
- **Nil Safety**: Handles cases where one or both taxa are `nil`, returning the non-nil taxon (or an error if *both* are nil or lack internal `Node` references).
|
||||
- **Validation Checks**:
|
||||
- Ensures both taxa belong to the *same* `Taxonomy`.
|
||||
- Verifies that the taxonomy is **rooted** (i.e., has a defined root node).
|
||||
- **Path-Based Traversal**:
|
||||
- Retrieves the full path from each taxon to the root via `Path()` (assumed to return an ordered list of nodes).
|
||||
- Traverses both paths *backwards* (from root toward leaves) until divergence is detected.
|
||||
- The first divergent node marks the boundary; the LCA is the last *common* ancestor (i.e., `slice[i+1]` after loop exit).
|
||||
|
||||
### Semantic Meaning
|
||||
- The LCA represents the most specific taxonomic node that *contains both taxa* in its subtree.
|
||||
- This operation is foundational for tasks like:
|
||||
- Taxonomic classification consistency checks,
|
||||
- Phylogenetic inference (e.g., computing taxon distances),
|
||||
- Hierarchical aggregation in biodiversity analyses.
|
||||
|
||||
### Error Handling
|
||||
Explicit errors cover:
|
||||
- Invalid inputs (`nil` taxa, missing nodes),
|
||||
- Cross-taxonomy queries,
|
||||
- Unrooted taxonomy (undefined root → no unique LCA possible).
|
||||
|
||||
This implementation assumes a **directed acyclic graph** (specifically, a tree) structure for the taxonomy hierarchy.
|
||||
@@ -0,0 +1,41 @@
|
||||
# `obitax` Package: Taxon String Parser
|
||||
|
||||
The `obitax` package provides a robust parser for structured taxonomic strings used in biodiversity data processing.
|
||||
|
||||
## Core Functionality
|
||||
|
||||
- **`ParseTaxonString(taxonStr string)`**
|
||||
Parses strings in the format: `code:taxid [scientific name]@rank`.
|
||||
|
||||
- **Input Format Requirements**
|
||||
- `code`: Taxonomy identifier (e.g., "GBIF", "NCBI")
|
||||
- `taxid`: Numeric or alphanumeric taxonomic ID (e.g., "123456")
|
||||
- `scientific name`: Enclosed in square brackets (e.g., "[Homo sapiens]")
|
||||
- `rank`: Optional taxonomic rank after `@` (e.g., "species", defaults to `"no rank"` if missing)
|
||||
|
||||
- **Robustness Features**
|
||||
- Trims whitespace around all components.
|
||||
- Handles multiple `@` symbols (returns error).
|
||||
- Validates bracket pairing and ordering.
|
||||
- Ensures `code:taxid` contains exactly one colon separator.
|
||||
|
||||
- **Error Handling**
|
||||
Returns descriptive errors for:
|
||||
- Missing or malformed brackets
|
||||
- Invalid number of `@` separators
|
||||
- Absent colon in code:taxid segment
|
||||
- Empty fields (code, taxid, or scientific name)
|
||||
|
||||
- **Use Cases**
|
||||
Ideal for parsing legacy biodiversity records (e.g., from OBIS, GBIF), where taxon strings are semi-structured and need reliable extraction before indexing or matching against reference databases.
|
||||
|
||||
## Example
|
||||
|
||||
Input: `"GBIF:248093 [Homo sapiens]@species"`
|
||||
Output components:
|
||||
- `code = "GBIF"`
|
||||
- `taxid = "248093"`
|
||||
- `scientificName = "Homo sapiens"`
|
||||
- `rank = "species"`
|
||||
|
||||
Returns empty strings and an error for invalid inputs.
|
||||
@@ -0,0 +1,19 @@
|
||||
# `obitax` Package: Taxonomic Identifier Handling
|
||||
|
||||
The `obitax` package provides a lightweight, type-safe abstraction for handling taxonomic identifiers (`Taxid`) in the OBITools4 ecosystem.
|
||||
|
||||
- **`Taxid` type**: A pointer to a string, representing an opaque taxonomic ID (e.g., NCBI TaxID).
|
||||
- **`TaxidFactory`**: A factory for constructing `Taxid`s from strings or integers, enforcing validation and normalization.
|
||||
|
||||
Key features:
|
||||
- **Code prefix enforcement**: `FromString` validates that the input string starts with a required taxonomy code (e.g., `"tx"`), returning an error otherwise.
|
||||
- **String parsing**: Automatically strips leading whitespace and extracts the suffix after `':'`.
|
||||
- **Alphabet filtering**: Uses an ASCII set to extract only valid characters (e.g., digits), ensuring clean, standardized IDs.
|
||||
- **String interning**: Internally uses `Innerize` (via `InnerString`) to deduplicate strings—improving memory efficiency and comparison speed.
|
||||
- **Type safety**: `Taxid` is a distinct type (not raw string), reducing misuse and enabling future extension.
|
||||
|
||||
Supported conversions:
|
||||
- `FromString(string)`: Parses `"tx:12345"` → internalized `"12345"`.
|
||||
- `FromInt(int)`: Converts e.g., `12345` → internalized `"12345"`.
|
||||
|
||||
Designed for high-performance pipelines where many taxonomic IDs are processed and reused.
|
||||
@@ -0,0 +1,29 @@
|
||||
# `obitax` Package: Taxonomic Data Model and Navigation
|
||||
|
||||
The `obitax` package provides a semantic model for representing, querying, and manipulating taxonomic hierarchies in biodiversity data processing. Its core abstraction is the `Taxon` type, which encapsulates both structural (node ID, parent/child relationships) and semantic (scientific name, rank, metadata) information.
|
||||
|
||||
### Core Features
|
||||
|
||||
- **Taxon Representation**: Each `Taxon` links to a taxonomy and its underlying node, supporting multiple name classes (e.g., "scientific name", "common name"), customizable ranks, and extensible metadata via key-value pairs.
|
||||
- **String Interoperability**: Implements `String()` for human-readable output (`taxonomy:taxid [name]`) and provides typed accessors like `ScientificName()`, `Rank()`, or `IsRoot()`.
|
||||
|
||||
### Name Handling & Matching
|
||||
|
||||
- Flexible name retrieval via `Name(class)`, case-insensitive equality (`IsNameEqual`), and regex-based matching (`IsNameMatching`). Names are interned for memory efficiency.
|
||||
|
||||
### Hierarchical Navigation
|
||||
|
||||
- **Path Traversal**: `IPath()` yields an iterator from current taxon up to root; `Path()` materializes this as a slice. Enables efficient lineage queries.
|
||||
- **Rank-Based Lookup**: Methods like `TaxonAtRank(rank)`, or convenience wrappers (`Species()`, `Genus()`, `Family()`), allow targeted retrieval of higher-level ancestors.
|
||||
- **Child Management**: Supports dynamic tree extension via `AddChild()`, parsing taxon strings and enforcing taxonomy consistency.
|
||||
|
||||
### Metadata Support
|
||||
|
||||
- Rich metadata operations: `SetMetadata`, `GetMetadata`, key/value iteration, and typed conversion (`MetadataAsString`). Enables attaching arbitrary annotations (e.g., confidence scores, source references).
|
||||
|
||||
### Robustness & Safety
|
||||
|
||||
- Nil-safe accessors prevent panics; logging and error handling ensure correctness (e.g., fatal on missing root in `IPath()`).
|
||||
- Interning of names/ranks/classes (`Innerize`) reduces duplication and speeds comparisons.
|
||||
|
||||
Designed for scalability in large-scale metabarcoding pipelines, `obitax` bridges raw taxonomic data with high-level analytical operations.
|
||||
@@ -0,0 +1,36 @@
|
||||
# `obitax` Package: Taxonomic Node Representation and Management
|
||||
|
||||
The `obitax` package provides a lightweight, pointer-based Go implementation for representing taxonomic nodes in biological classification systems.
|
||||
|
||||
## Core Data Structure
|
||||
|
||||
- **`TaxNode`**: Represents a single taxon (e.g., species, genus) with the following fields:
|
||||
- `id`: Unique taxon identifier (pointer to string).
|
||||
- `parent`: Identifier of the parent node in the taxonomy hierarchy.
|
||||
- `rank`: Taxonomic rank (e.g., `"species"`, `"family"`).
|
||||
- `scientificname`: Canonical scientific name (e.g., *Homo sapiens*).
|
||||
- `alternatenames`: Map of alternative names keyed by name class (e.g., `"common_name"`, `"synonym"`).
|
||||
|
||||
## Key Functionalities
|
||||
|
||||
- **String Representation**
|
||||
`String(taxonomyCode)` returns a formatted label like `"NCBI:12345 [Homo sapiens]@species"` (or raw ID if enabled via `obidefault.UseRawTaxids()`).
|
||||
|
||||
- **Accessors**
|
||||
- `Id()`, `ParentId()`: Retrieve identifiers.
|
||||
- `ScientificName()` / `Rank()`: Return name or rank (defaulting to `"NA"` if missing).
|
||||
- `Name(class)`: Fetch name by class (`"scientific name"` or alternate).
|
||||
|
||||
- **Mutators**
|
||||
- `SetName(name, class)`: Assign scientific name or add/update alternate names.
|
||||
|
||||
- **Name Matching & Validation**
|
||||
- `IsNameEqual(name, ignoreCase)`: Exact or case-insensitive match against scientific/alternate names.
|
||||
- `IsNameMatching(pattern)`: Regex-based pattern matching over all available names.
|
||||
|
||||
## Design Notes
|
||||
|
||||
- Uses pointers for optional fields (enables `nil` semantics).
|
||||
- Graceful handling of missing data (`NA`, empty strings, safe dereferencing with `nil` checks).
|
||||
- Integrates logging via Logrus (`log.Panic` on misuse, e.g., setting name of `nil` node).
|
||||
- Designed for use in larger OBITools pipelines (e.g., with `obidefault` configuration).
|
||||
@@ -0,0 +1,18 @@
|
||||
# `obitax` Package: Taxonomic Data Management
|
||||
|
||||
The `obitax` package provides a robust framework for managing hierarchical taxonomic classifications. Its core component is the `Taxonomy` struct, which encapsulates metadata (name, code), taxon identifiers (`ids`, `ranks`), names and name classes (`names`, `nameclasses`), node hierarchy (`nodes`, `root`), indexing for fast lookup, and validation logic.
|
||||
|
||||
## Key Functionalities
|
||||
|
||||
- **Initialization**: `NewTaxonomy()` creates a new taxonomy with configurable identifier alphabet and initializes internal data structures.
|
||||
- **Identifier Handling**: `Id()` validates and converts string-based taxon IDs to internal representations; `TaxidString()` retrieves formatted identifiers (e.g., `"code:id [name]"`).
|
||||
- **Taxon Access**: `Taxon()` fetches a taxon by ID, returning whether it's an alias; `AsTaxonSet()` exposes the full taxonomic node collection.
|
||||
- **Structure Management**:
|
||||
- `AddTaxon()` inserts a new taxon with parent, rank, and root flags.
|
||||
- `AddAlias()` maps alternative IDs to existing taxa (supporting replacement).
|
||||
- **Metadata Queries**: Methods like `RankList()`, `Name()`, and `Code()` expose taxonomy metadata.
|
||||
- **Root Control**: `SetRoot()`/`Root()` manage the root node; `HasRoot()` checks its presence.
|
||||
- **Path Insertion**: `InsertPathString()` builds or extends a taxonomy from an ordered list of taxon strings, enforcing parent-child consistency.
|
||||
- **Phylogenetic Export**: `AsPhyloTree()` converts the taxonomy into a phylogeny-compatible tree (`obiphylo.PhyloNode`), enabling downstream evolutionary analysis.
|
||||
|
||||
All operations gracefully handle `nil` receivers via an internal `.OrDefault()` helper, ensuring safe usage in pipelines. Error reporting is explicit and contextualized (e.g., duplicate taxon, missing parent).
|
||||
@@ -0,0 +1,24 @@
|
||||
# TaxonSet: Semantic Description of Functionality
|
||||
|
||||
The `TaxonSet` type manages a collection of taxonomic entities within a hierarchical taxonomy system. It stores mappings from unique identifiers (pointers to strings) to `TaxNode` instances, supporting both canonical taxa and aliases.
|
||||
|
||||
- **Construction**: Created via `(Taxonomy).NewTaxonSet()`, initializing an empty set and linking it to a specific taxonomy.
|
||||
|
||||
- **Basic Queries**:
|
||||
- `Get(id)`: Retrieves the corresponding taxon (or nil).
|
||||
- `Len()`: Returns count of *unique* taxa, excluding aliases.
|
||||
- `Contains(id)`, `IsATaxon(id)`, and `IsAlias(id)` enable precise taxon/alias distinction.
|
||||
|
||||
- **Insertion & Management**:
|
||||
- `Insert(node)`: Adds or updates a taxon node.
|
||||
- `InsertTaxon(taxon)`: Safe insertion with taxonomy validation; auto-creates set if nil.
|
||||
- `Alias(id, taxon)`: Registers an alias (non-canonical ID pointing to a real node), incrementing internal `nalias` counter.
|
||||
|
||||
- **Hierarchy & Iteration**:
|
||||
- `Sort()`: Returns a topologically sorted slice of taxa (parents before children), respecting tree structure.
|
||||
- `Taxonomy()`: Provides access to the parent taxonomy.
|
||||
|
||||
- **Phylogenetic Export**:
|
||||
- `AsPhyloTree(root)`: Converts the set into a rooted phylogenetic tree (`obiphylo.PhyloNode`), embedding taxon names, ranks, and parent relationships as node attributes.
|
||||
|
||||
In essence, `TaxonSet` enables efficient storage, lookup, validation, and structural manipulation of taxonomic data—supporting both biological classification logic (e.g., alias resolution, hierarchy traversal) and downstream interoperability with phylogenetic tools.
|
||||
@@ -0,0 +1,25 @@
|
||||
# `obitax` Package: Taxonomic Data Handling
|
||||
|
||||
The `obitax` package provides structured support for managing collections of taxon nodes in a biological taxonomy.
|
||||
|
||||
- **Core Type**: `TaxonSlice` encapsulates an ordered list of `*TaxNode`s and a reference to their parent `Taxonomy`.
|
||||
- **Construction**: Created via `(taxonomy *Taxonomy).NewTaxonSlice(size, capacity)`, initializing a typed slice with optional pre-allocation.
|
||||
- **Accessors**:
|
||||
- `Get(i int) *TaxNode`: retrieves the raw node at index.
|
||||
- `Taxon(i int) *Taxon`: wraps a node with its taxonomy context, enabling richer operations.
|
||||
- `Len() int`: returns the current number of nodes.
|
||||
|
||||
- **Mutation Methods**:
|
||||
- `Set(index, taxon)`: replaces a node at given index (taxonomy-mismatch panics).
|
||||
- `Push(taxon)`: appends a taxon to the end (also enforces taxonomy consistency).
|
||||
- `ReduceToSize(n)`: truncates slice to first *n* elements.
|
||||
|
||||
- **Utility Features**:
|
||||
- `Reverse(inplace)`: reverses node order — either in-place or as a new slice.
|
||||
- `String() string`: formats the entire path as `"id@sci_name@rank"` entries, separated by `|`, in *reverse* (leaf-to-root) order — ideal for lineage strings.
|
||||
|
||||
- **Safety & Semantics**:
|
||||
- Nil-safety in all methods (returns `nil` or zero).
|
||||
- Enforces taxonomy coherence: mixing taxa from different taxonomies triggers a panic.
|
||||
|
||||
This package enables efficient, type-safe manipulation of hierarchical biological classification paths (e.g., for sequence annotation or metabarcoding output).
|
||||
Reference in New Issue
Block a user