Files
obitools4/autodoc/docmd/pkg/obiseq/taxonomy_methods.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

42 lines
2.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Taxonomic Annotation Features in `obiseq` Package
This package provides semantic taxonomic annotation capabilities for biological sequences (`BioSequence`). It integrates with a taxonomy database to assign, retrieve, and manage taxonomic identifiers (taxids) and related metadata.
## Core Functions
- **`Taxid()`**: Retrieves the taxonomic ID as a string (e.g., `"12345"` or `"NA"`), supporting multiple internal representations (`string`, `int`, `float64`). Returns `"NA"` if no taxid is set.
- **`Taxon(taxonomy)`**: Returns the corresponding `*obitax.Taxon` object, or `nil` if taxid is `"NA"`.
- **`SetTaxid(taxid, rank...)`**: Assigns a taxonomic ID to the sequence. Validates against default taxonomy; handles aliases and errors based on configuration flags (`FailOnTaxonomy`, `UpdateTaxid`). Optionally stores taxid under a custom rank (e.g., `"genus_taxid"`).
- **`SetTaxon(taxon, rank...)`**: Assigns a `*obitax.Taxon` object directly; stores its string representation as taxid.
## Rank-Specific Annotation
- **`SetTaxonAtRank(taxonomy, rank)`**: Annotates the sequence with taxid and scientific name at a specified Linnaean rank (e.g., `"species"`, `"genus"`). Sets two attributes: `rank_taxid` and `rank_name`. Returns the taxon at that rank (or `nil`).
- **Convenience wrappers**:
- `SetSpecies(...)`
- `SetGenus(...)`
- `SetFamily(...)`
All delegate to `SetTaxonAtRank`.
## Taxonomic Path & Metadata
- **`SetPath(taxonomy)`**: Computes and stores the full taxonomic lineage (from root to species) as a string slice under attribute `"taxonomic_path"`.
- **`Path()`**: Retrieves the stored taxonomic path; recomputes it if missing and a default taxonomy exists.
- **`SetScientificName(taxonomy)`**: Stores the sequences species-level scientific name under `"scientific_name"`.
- **`SetTaxonomicRank(taxonomy)`**: Stores the taxons rank (e.g., `"species"`, `"genus"`) under `"taxonomic_rank"`.
## Error Handling & Configuration
- Uses `logrus` and custom logging (`obilog`) for warnings/errors.
- Behavior on taxonomy mismatches (e.g., unknown taxid, alias) is configurable via `obidefault` settings.
- Ensures type consistency: taxid must be string, int, or float; invalid types trigger fatal errors.
All methods are designed for seamless integration into bioinformatics pipelines, enabling robust taxonomic profiling of sequencing data.