mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
3.3 KiB
3.3 KiB
obitaxonomy: CLI-Oriented Taxonomic Data Utilities for OBItools4
The obitaxonomy Go package delivers modular, command-line-friendly tools for loading, filtering, navigating, and exporting taxonomic data within the OBItools4 ecosystem. It focuses on enabling reproducible, scriptable workflows for metagenomics and biodiversity informatics by abstracting complex taxonomy operations behind intuitive CLI flags.
Public Functionalities
Taxonomy Restriction & Filtering
CLITaxonRestrictions(): Wraps a taxonomy iterator to apply user-defined clade restrictions via--restrict-to-taxon(-r). Supports taxon IDs or names (with optional regex), returning a filtered iterator over matching subtrees.CLIFilterRankRestriction(): Restricts the taxonomy iterator to taxa of a specific rank (e.g.,"species","family"), controlled by--rank(-R). Returns a constrained iterator for downstream processing.
Subtree Navigation & Iteration
CLISubTaxonomyIterator(): Returns an iterator over the subtree rooted at a user-specified taxon ID (via--dump/-D). If no root is provided, exits with an error—enabling safe CLI-driven subtree extraction.
CSV Export
CLICSVTaxaIterator(): Transforms a taxonomy iterator into an ordered stream of CSV records. Configurable columns include:- Scientific name (
--without-scientific-nameto omit), - Taxonomic rank (omittable via
-R), - Parent taxon ID (
--without-parent/-W), - Full lineage path (via
--path,-P), - Query source match (
--with-query).
- Scientific name (
CLICSVTaxaWriter(): WrapsCLICSVTaxaIterator(), handling output destination (-= stdout, file path otherwise), and integrates with CLI logging.
Tree Export
CLINewickWriter(): Exports a taxonomy subtree (from--dump) as Newick format. Supports:- Compression (
gzipvia-z), - Leaf labels (scientific name/rank/taxid toggles),
- Root trimming (
--trim-root), - Output to file or stdout.
- Compression (
Data Acquisition
CLIDownloadNCBITaxdump(): Fetches the latest NCBI taxonomy dump (taxdump.tar.gz) and saves it asncbitaxo_YYYYMMDD.tgz(or custom name). Designed for one-click taxonomy setup.
Utility & Inspection Helpers
CLIRankRestriction()/CLIWithScientificName(): Expose parsed CLI flags for use in custom processing pipelines.--rank-list(-l): Prints all available ranks in the loaded taxonomy (for introspection).- Pattern matching:
--fixed(-F) disables regex for taxon name queries, enabling literal string matching.
Integration & Design Principles
- Built on
obitaxfor core taxonomy operations. - Fully compatible with OBItools4’s option parsing (
getoptions) and iterator patterns. - Designed for composition: integrates seamlessly with
obiconvert(output formatting) and other CLI modules. - All functions respect
-, stdout/stderr conventions, logging levels (--verbose), and CLI flag parsing. - No internal state mutation—functions are pure wrappers around iterator transformations.
Target Use Cases
- Filtering metagenomic assignments to a clade of interest (e.g.,
--restrict-to-taxon 9606for Homo sapiens). - Exporting species-level taxa to CSV/JSON for downstream analysis.
- Generating Newick trees from custom taxonomic subsets (e.g., all Enterobacteriaceae).
- Bootstrapping local taxonomy caches via
--download-ncbi.