Files
obitools4/autodoc/docmd/pkg/obiformats/taxonomy_read.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

35 lines
1.6 KiB
Markdown

# Taxonomy Loading Module (`obiformats`)
This Go package provides semantic functionality to automatically detect and load taxonomic data from various file formats. It supports flexible, format-agnostic taxonomy ingestion via a unified interface.
## Core Features
1. **Format Detection**
- `DetectTaxonomyFormat(path)` identifies the taxonomy source format by inspecting file type (directory, MIME-type), filename patterns, or structure.
- Supports:
• NCBI Taxdump (both directory and `.tar` archive)
• CSV files (`text/csv`)
• FASTA/FASTQ sequences (via `mimetype` detection)
2. **Modular Loaders**
- Returns a typed `TaxonomyLoader` function, enabling deferred loading with configurable options (`onlysn`, `seqAsTaxa`).
- Each loader abstracts format-specific parsing (e.g., NCBI `nodes.dmp`, FASTA header taxonomy extraction).
3. **Sequence-Based Taxonomy Extraction**
- For sequence files (FASTA/FASTQ), taxonomy is inferred from headers or associated metadata, using `ExtractTaxonomy()`.
4. **Integration with OBITools Ecosystem**
- Leverages `obitax.Taxonomy` as the canonical output structure.
- Uses custom MIME-type registration (`obiutils.RegisterOBIMimeType()`) for robust detection of bioinformatics formats.
5. **Error Handling & Logging**
- Graceful failure with descriptive errors; informative logging via `logrus`.
## Usage Flow
```go
tax, err := LoadTaxonomy("path/to/data", onlysn=true, seqAsTaxa=false)
```
The module enables interoperability across taxonomic data sources in metabarcoding workflows.