mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
NCBI Taxonomy Archive Support in obiformats
This Go package provides utilities for handling NCBI Taxonomy dumps archived as .tar files.
Core Functionalities
-
Archive Validation (
IsNCBITarTaxDump)- Checks whether a given
.tarfile contains all required NCBI Taxonomy dump files:citations.dmp,division.dmp,gencode.dmp,names.dmp,delnodes.dmp,gc.prt,merged.dmp, andnodes.dmp. - Returns a boolean indicating if the archive is a complete NCBI tax dump.
- Checks whether a given
-
Taxonomy Loading (
LoadNCBITarTaxDump)- Parses the
.tararchive and extracts key files to build aTaxonomyobject. - Steps include:
- Nodes: Loads taxonomic hierarchy (
nodes.dmp) vialoadNodeTable. - Names: Parses scientific and common names (
names.dmp) vialoadNameTable, with an option to load only scientific names (onlysn). - Merged Taxa: Integrates taxonomic aliases from
merged.dmp, usingloadMergedTable.
- Nodes: Loads taxonomic hierarchy (
- Sets the root taxon to NCBI’s default (
taxid = 1, i.e., root).
- Parses the
-
Integration with Other Modules
- Uses
obiutils.Ropen,TarFileReaderfor robust file handling. - Leverages
obitax.Taxonomy, a structured representation of taxonomic data.
- Uses
Key Parameters
onlysn: If true, only scientific names are loaded (reduces memory usage).seqAsTaxa: Reserved for future use; currently unused.
Logging & Error Handling
- Uses
logrusto log loading progress and counts. - Returns descriptive errors if required files or the root taxon are missing.
Note
: Designed for efficient, standards-compliant ingestion of NCBI Taxonomy data in bioinformatics pipelines.