mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.6 KiB
1.6 KiB
Taxonomy Loading Module (obiformats)
This Go package provides semantic functionality to automatically detect and load taxonomic data from various file formats. It supports flexible, format-agnostic taxonomy ingestion via a unified interface.
Core Features
-
Format Detection
DetectTaxonomyFormat(path)identifies the taxonomy source format by inspecting file type (directory, MIME-type), filename patterns, or structure.- Supports:
• NCBI Taxdump (both directory and.tararchive)
• CSV files (text/csv)
• FASTA/FASTQ sequences (viamimetypedetection)
-
Modular Loaders
- Returns a typed
TaxonomyLoaderfunction, enabling deferred loading with configurable options (onlysn,seqAsTaxa). - Each loader abstracts format-specific parsing (e.g., NCBI
nodes.dmp, FASTA header taxonomy extraction).
- Returns a typed
-
Sequence-Based Taxonomy Extraction
- For sequence files (FASTA/FASTQ), taxonomy is inferred from headers or associated metadata, using
ExtractTaxonomy().
- For sequence files (FASTA/FASTQ), taxonomy is inferred from headers or associated metadata, using
-
Integration with OBITools Ecosystem
- Leverages
obitax.Taxonomyas the canonical output structure. - Uses custom MIME-type registration (
obiutils.RegisterOBIMimeType()) for robust detection of bioinformatics formats.
- Leverages
-
Error Handling & Logging
- Graceful failure with descriptive errors; informative logging via
logrus.
- Graceful failure with descriptive errors; informative logging via
Usage Flow
tax, err := LoadTaxonomy("path/to/data", onlysn=true, seqAsTaxa=false)
The module enables interoperability across taxonomic data sources in metabarcoding workflows.