mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.5 KiB
1.5 KiB
obitax Package: Taxon String Parser
The obitax package provides a robust parser for structured taxonomic strings used in biodiversity data processing.
Core Functionality
-
ParseTaxonString(taxonStr string)
Parses strings in the format:code:taxid [scientific name]@rank. -
Input Format Requirements
code: Taxonomy identifier (e.g., "GBIF", "NCBI")taxid: Numeric or alphanumeric taxonomic ID (e.g., "123456")scientific name: Enclosed in square brackets (e.g., "[Homo sapiens]")rank: Optional taxonomic rank after@(e.g., "species", defaults to"no rank"if missing)
-
Robustness Features
- Trims whitespace around all components.
- Handles multiple
@symbols (returns error). - Validates bracket pairing and ordering.
- Ensures
code:taxidcontains exactly one colon separator.
-
Error Handling
Returns descriptive errors for:- Missing or malformed brackets
- Invalid number of
@separators - Absent colon in code:taxid segment
- Empty fields (code, taxid, or scientific name)
-
Use Cases
Ideal for parsing legacy biodiversity records (e.g., from OBIS, GBIF), where taxon strings are semi-structured and need reliable extraction before indexing or matching against reference databases.
Example
Input: "GBIF:248093 [Homo sapiens]@species"
Output components:
code = "GBIF"taxid = "248093"scientificName = "Homo sapiens"rank = "species"
Returns empty strings and an error for invalid inputs.