mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.4 KiB
1.4 KiB
CSV Taxonomy Loader for OBITools4
This Go module provides a function LoadCSVTaxonomy to parse and load taxonomic data from CSV files into an internal taxonomy structure.
Key Features:
- Robust CSV Parsing: Uses Go’s
encoding/csvwith configurable options (comment lines, lazy quotes, whitespace trimming). - Column Mapping: Dynamically identifies required columns:
taxid,parent,scientific_name, andtaxonomic_rank. - Error Handling: Validates presence of all required columns; fails early with descriptive errors.
- Taxonomy Construction:
- Builds a hierarchical taxonomy using
obitax.Taxonobjects. - Ensures existence of a root node; returns error otherwise.
- Builds a hierarchical taxonomy using
- Metadata Extraction:
- Derives taxonomy name and short code (e.g., prefix before
:in first taxid). - Logs key metadata for traceability.
- Derives taxonomy name and short code (e.g., prefix before
- Scalable Design:
- Processes records line-by-line (memory-efficient).
- Supports large datasets via streaming CSV reading.
Input Format:
CSV must contain exactly four columns (case-sensitive headers):
taxid: Unique taxon identifier.parent: Parent taxonomic node ID (empty for root).scientific_name: Binomial or descriptive name.taxonomic_rank: e.g., species, genus.
Output:
Returns a fully populated obitax.Taxonomy object ready for downstream phylogenetic or sequence classification tasks.