mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
28 lines
1.4 KiB
Markdown
28 lines
1.4 KiB
Markdown
## CSV Taxonomy Loader for OBITools4
|
||
|
||
This Go module provides a function `LoadCSVTaxonomy` to parse and load taxonomic data from CSV files into an internal taxonomy structure.
|
||
|
||
### Key Features:
|
||
- **Robust CSV Parsing**: Uses Go’s `encoding/csv` with configurable options (comment lines, lazy quotes, whitespace trimming).
|
||
- **Column Mapping**: Dynamically identifies required columns: `taxid`, `parent`, `scientific_name`, and `taxonomic_rank`.
|
||
- **Error Handling**: Validates presence of all required columns; fails early with descriptive errors.
|
||
- **Taxonomy Construction**:
|
||
- Builds a hierarchical taxonomy using `obitax.Taxon` objects.
|
||
- Ensures existence of a root node; returns error otherwise.
|
||
- **Metadata Extraction**:
|
||
- Derives taxonomy name and short code (e.g., prefix before `:` in first taxid).
|
||
- Logs key metadata for traceability.
|
||
- **Scalable Design**:
|
||
- Processes records line-by-line (memory-efficient).
|
||
- Supports large datasets via streaming CSV reading.
|
||
|
||
### Input Format:
|
||
CSV must contain exactly four columns (case-sensitive headers):
|
||
- `taxid`: Unique taxon identifier.
|
||
- `parent`: Parent taxonomic node ID (empty for root).
|
||
- `scientific_name`: Binomial or descriptive name.
|
||
- `taxonomic_rank`: e.g., *species*, *genus*.
|
||
|
||
### Output:
|
||
Returns a fully populated `obitax.Taxonomy` object ready for downstream phylogenetic or sequence classification tasks.
|