Files
obitools4/autodoc/docmd/pkg/obiformats/taxonomy_read.md
T

35 lines
1.6 KiB
Markdown
Raw Normal View History

2026-04-07 08:36:50 +02:00
# Taxonomy Loading Module (`obiformats`)
This Go package provides semantic functionality to automatically detect and load taxonomic data from various file formats. It supports flexible, format-agnostic taxonomy ingestion via a unified interface.
## Core Features
1. **Format Detection**
- `DetectTaxonomyFormat(path)` identifies the taxonomy source format by inspecting file type (directory, MIME-type), filename patterns, or structure.
- Supports:
• NCBI Taxdump (both directory and `.tar` archive)
• CSV files (`text/csv`)
• FASTA/FASTQ sequences (via `mimetype` detection)
2. **Modular Loaders**
- Returns a typed `TaxonomyLoader` function, enabling deferred loading with configurable options (`onlysn`, `seqAsTaxa`).
- Each loader abstracts format-specific parsing (e.g., NCBI `nodes.dmp`, FASTA header taxonomy extraction).
3. **Sequence-Based Taxonomy Extraction**
- For sequence files (FASTA/FASTQ), taxonomy is inferred from headers or associated metadata, using `ExtractTaxonomy()`.
4. **Integration with OBITools Ecosystem**
- Leverages `obitax.Taxonomy` as the canonical output structure.
- Uses custom MIME-type registration (`obiutils.RegisterOBIMimeType()`) for robust detection of bioinformatics formats.
5. **Error Handling & Logging**
- Graceful failure with descriptive errors; informative logging via `logrus`.
## Usage Flow
```go
tax, err := LoadTaxonomy("path/to/data", onlysn=true, seqAsTaxa=false)
```
The module enables interoperability across taxonomic data sources in metabarcoding workflows.