Files
obitools4/autodoc/docmd/pkg/obiformats/ncbitaxdump_readtar.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.7 KiB
Raw Blame History

NCBI Taxonomy Archive Support in obiformats

This Go package provides utilities for handling NCBI Taxonomy dumps archived as .tar files.

Core Functionalities

  1. Archive Validation (IsNCBITarTaxDump)

    • Checks whether a given .tar file contains all required NCBI Taxonomy dump files: citations.dmp, division.dmp, gencode.dmp, names.dmp, delnodes.dmp, gc.prt, merged.dmp, and nodes.dmp.
    • Returns a boolean indicating if the archive is a complete NCBI tax dump.
  2. Taxonomy Loading (LoadNCBITarTaxDump)

    • Parses the .tar archive and extracts key files to build a Taxonomy object.
    • Steps include:
      • Nodes: Loads taxonomic hierarchy (nodes.dmp) via loadNodeTable.
      • Names: Parses scientific and common names (names.dmp) via loadNameTable, with an option to load only scientific names (onlysn).
      • Merged Taxa: Integrates taxonomic aliases from merged.dmp, using loadMergedTable.
    • Sets the root taxon to NCBIs default (taxid = 1, i.e., root).
  3. Integration with Other Modules

    • Uses obiutils.Ropen, TarFileReader for robust file handling.
    • Leverages obitax.Taxonomy, a structured representation of taxonomic data.

Key Parameters

  • onlysn: If true, only scientific names are loaded (reduces memory usage).
  • seqAsTaxa: Reserved for future use; currently unused.

Logging & Error Handling

  • Uses logrus to log loading progress and counts.
  • Returns descriptive errors if required files or the root taxon are missing.

Note

: Designed for efficient, standards-compliant ingestion of NCBI Taxonomy data in bioinformatics pipelines.