Files
obitools4/autodoc/docmd/pkg/obiformats/csvtaxdump_read.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.4 KiB
Raw Blame History

CSV Taxonomy Loader for OBITools4

This Go module provides a function LoadCSVTaxonomy to parse and load taxonomic data from CSV files into an internal taxonomy structure.

Key Features:

  • Robust CSV Parsing: Uses Gos encoding/csv with configurable options (comment lines, lazy quotes, whitespace trimming).
  • Column Mapping: Dynamically identifies required columns: taxid, parent, scientific_name, and taxonomic_rank.
  • Error Handling: Validates presence of all required columns; fails early with descriptive errors.
  • Taxonomy Construction:
    • Builds a hierarchical taxonomy using obitax.Taxon objects.
    • Ensures existence of a root node; returns error otherwise.
  • Metadata Extraction:
    • Derives taxonomy name and short code (e.g., prefix before : in first taxid).
    • Logs key metadata for traceability.
  • Scalable Design:
    • Processes records line-by-line (memory-efficient).
    • Supports large datasets via streaming CSV reading.

Input Format:

CSV must contain exactly four columns (case-sensitive headers):

  • taxid: Unique taxon identifier.
  • parent: Parent taxonomic node ID (empty for root).
  • scientific_name: Binomial or descriptive name.
  • taxonomic_rank: e.g., species, genus.

Output:

Returns a fully populated obitax.Taxonomy object ready for downstream phylogenetic or sequence classification tasks.