Files
obitools4/autodoc/docmd/pkg/obiseq/taxonomy_methods.md
T

42 lines
2.3 KiB
Markdown
Raw Normal View History

2026-04-07 08:36:50 +02:00
# Taxonomic Annotation Features in `obiseq` Package
This package provides semantic taxonomic annotation capabilities for biological sequences (`BioSequence`). It integrates with a taxonomy database to assign, retrieve, and manage taxonomic identifiers (taxids) and related metadata.
## Core Functions
- **`Taxid()`**: Retrieves the taxonomic ID as a string (e.g., `"12345"` or `"NA"`), supporting multiple internal representations (`string`, `int`, `float64`). Returns `"NA"` if no taxid is set.
- **`Taxon(taxonomy)`**: Returns the corresponding `*obitax.Taxon` object, or `nil` if taxid is `"NA"`.
- **`SetTaxid(taxid, rank...)`**: Assigns a taxonomic ID to the sequence. Validates against default taxonomy; handles aliases and errors based on configuration flags (`FailOnTaxonomy`, `UpdateTaxid`). Optionally stores taxid under a custom rank (e.g., `"genus_taxid"`).
- **`SetTaxon(taxon, rank...)`**: Assigns a `*obitax.Taxon` object directly; stores its string representation as taxid.
## Rank-Specific Annotation
- **`SetTaxonAtRank(taxonomy, rank)`**: Annotates the sequence with taxid and scientific name at a specified Linnaean rank (e.g., `"species"`, `"genus"`). Sets two attributes: `rank_taxid` and `rank_name`. Returns the taxon at that rank (or `nil`).
- **Convenience wrappers**:
- `SetSpecies(...)`
- `SetGenus(...)`
- `SetFamily(...)`
All delegate to `SetTaxonAtRank`.
## Taxonomic Path & Metadata
- **`SetPath(taxonomy)`**: Computes and stores the full taxonomic lineage (from root to species) as a string slice under attribute `"taxonomic_path"`.
- **`Path()`**: Retrieves the stored taxonomic path; recomputes it if missing and a default taxonomy exists.
- **`SetScientificName(taxonomy)`**: Stores the sequences species-level scientific name under `"scientific_name"`.
- **`SetTaxonomicRank(taxonomy)`**: Stores the taxons rank (e.g., `"species"`, `"genus"`) under `"taxonomic_rank"`.
## Error Handling & Configuration
- Uses `logrus` and custom logging (`obilog`) for warnings/errors.
- Behavior on taxonomy mismatches (e.g., unknown taxid, alias) is configurable via `obidefault` settings.
- Ensures type consistency: taxid must be string, int, or float; invalid types trigger fatal errors.
All methods are designed for seamless integration into bioinformatics pipelines, enabling robust taxonomic profiling of sequencing data.