- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.3 KiB
Taxonomic Annotation Features in obiseq Package
This package provides semantic taxonomic annotation capabilities for biological sequences (BioSequence). It integrates with a taxonomy database to assign, retrieve, and manage taxonomic identifiers (taxids) and related metadata.
Core Functions
-
Taxid(): Retrieves the taxonomic ID as a string (e.g.,"12345"or"NA"), supporting multiple internal representations (string,int,float64). Returns"NA"if no taxid is set. -
Taxon(taxonomy): Returns the corresponding*obitax.Taxonobject, ornilif taxid is"NA". -
SetTaxid(taxid, rank...): Assigns a taxonomic ID to the sequence. Validates against default taxonomy; handles aliases and errors based on configuration flags (FailOnTaxonomy,UpdateTaxid). Optionally stores taxid under a custom rank (e.g.,"genus_taxid"). -
SetTaxon(taxon, rank...): Assigns a*obitax.Taxonobject directly; stores its string representation as taxid.
Rank-Specific Annotation
-
SetTaxonAtRank(taxonomy, rank): Annotates the sequence with taxid and scientific name at a specified Linnaean rank (e.g.,"species","genus"). Sets two attributes:rank_taxidandrank_name. Returns the taxon at that rank (ornil). -
Convenience wrappers:
SetSpecies(...)SetGenus(...)SetFamily(...)
All delegate toSetTaxonAtRank.
Taxonomic Path & Metadata
-
SetPath(taxonomy): Computes and stores the full taxonomic lineage (from root to species) as a string slice under attribute"taxonomic_path". -
Path(): Retrieves the stored taxonomic path; recomputes it if missing and a default taxonomy exists. -
SetScientificName(taxonomy): Stores the sequence’s species-level scientific name under"scientific_name". -
SetTaxonomicRank(taxonomy): Stores the taxon’s rank (e.g.,"species","genus") under"taxonomic_rank".
Error Handling & Configuration
- Uses
logrusand custom logging (obilog) for warnings/errors. - Behavior on taxonomy mismatches (e.g., unknown taxid, alias) is configurable via
obidefaultsettings. - Ensures type consistency: taxid must be string, int, or float; invalid types trigger fatal errors.
All methods are designed for seamless integration into bioinformatics pipelines, enabling robust taxonomic profiling of sequencing data.