mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.3 KiB
1.3 KiB
NCBI Taxonomy Loader Module (obiformats)
This Go package provides functionality to parse and load NCBI taxonomy dump files into a structured Taxonomy object. It supports three core file types:
- nodes.dmp: Defines the taxonomic hierarchy via
taxid|parent_taxid|rankrecords. - names.dmp: Maps taxonomic IDs to names and name classes (e.g., "scientific name", "common name").
- merged.dmp: Tracks deprecated taxonomic IDs and their replacements.
Key features:
- Custom CSV parsing with
|delimiter, comment support (#), and whitespace trimming. - Support for loading only scientific names via the
onlysnflag inLoadNCBITaxDump. - Efficient buffered reading (
bufio.Reader) for large files. - Automatic root taxon (taxid
"1", i.e., root) assignment after loading. - Alias resolution: deprecated taxids are mapped to current ones via
AddAlias. - Robust error handling with fatal logging on critical failures (e.g., missing root taxon, invalid parent references).
The main entry point is LoadNCBITaxDump(directory string, onlysn bool), which constructs a fully initialized taxonomy from NCBI dump files. Designed for integration with obitax and obiutils, it enables downstream applications (e.g., metabarcoding pipelines) to perform taxonomic queries and filtering.