Files
obitools4/autodoc/docmd/pkg/obiformats/csvtaxdump_read.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

28 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## CSV Taxonomy Loader for OBITools4
This Go module provides a function `LoadCSVTaxonomy` to parse and load taxonomic data from CSV files into an internal taxonomy structure.
### Key Features:
- **Robust CSV Parsing**: Uses Gos `encoding/csv` with configurable options (comment lines, lazy quotes, whitespace trimming).
- **Column Mapping**: Dynamically identifies required columns: `taxid`, `parent`, `scientific_name`, and `taxonomic_rank`.
- **Error Handling**: Validates presence of all required columns; fails early with descriptive errors.
- **Taxonomy Construction**:
- Builds a hierarchical taxonomy using `obitax.Taxon` objects.
- Ensures existence of a root node; returns error otherwise.
- **Metadata Extraction**:
- Derives taxonomy name and short code (e.g., prefix before `:` in first taxid).
- Logs key metadata for traceability.
- **Scalable Design**:
- Processes records line-by-line (memory-efficient).
- Supports large datasets via streaming CSV reading.
### Input Format:
CSV must contain exactly four columns (case-sensitive headers):
- `taxid`: Unique taxon identifier.
- `parent`: Parent taxonomic node ID (empty for root).
- `scientific_name`: Binomial or descriptive name.
- `taxonomic_rank`: e.g., *species*, *genus*.
### Output:
Returns a fully populated `obitax.Taxonomy` object ready for downstream phylogenetic or sequence classification tasks.