Files
obitools4/autodoc/docmd/pkg/obiseq/taxonomy_classifier.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.5 KiB

Taxonomic Classification via TaxonomyClassifier

The obiseq package provides a taxonomic classification mechanism through the TaxonomyClassifier function.

  • Purpose: Constructs a reusable classifier for biological sequences based on taxonomic hierarchy.

  • Inputs:

    • taxonomicRank: Target rank (e.g., "species", "genus").
    • taxonomy: Reference taxonomy (*obitax.Taxonomy), with fallback via .OrDefault(true).
    • abortOnMissing: Boolean flag to enforce strict taxon resolution.
  • Core Logic:

    • For each sequence, retrieves its Taxon, then drills down to the requested rank using .TaxonAtRank().
    • If abortOnMissing is true, exits on failure to resolve the taxon or rank.
    • Internally maps *TaxNodes to integer codes for efficient storage/comparison.
  • Returned Object (BioSequenceClassifier):

    • Code(sequence) int: Assigns a unique integer code to the taxonomic assignment of a sequence.
    • Value(code) string: Returns the scientific name corresponding to a code.
    • Reset(): Reinitializes internal mappings (useful for batch processing).
    • Clone() *BioSequenceClassifier: Creates a fresh, identical classifier instance.
  • Design Rationale:

    • Uses integer codes to avoid repeated string operations and enable fast indexing (e.g., for counting).
    • Supports both strict (abortOnMissing=true) and lenient classification modes.

This design enables scalable, efficient taxonomic profiling of sequencing datasets.