Files
obitools4/autodoc/docmd/pkg/obitax/string_parser.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

42 lines
1.5 KiB
Markdown

# `obitax` Package: Taxon String Parser
The `obitax` package provides a robust parser for structured taxonomic strings used in biodiversity data processing.
## Core Functionality
- **`ParseTaxonString(taxonStr string)`**
Parses strings in the format: `code:taxid [scientific name]@rank`.
- **Input Format Requirements**
- `code`: Taxonomy identifier (e.g., "GBIF", "NCBI")
- `taxid`: Numeric or alphanumeric taxonomic ID (e.g., "123456")
- `scientific name`: Enclosed in square brackets (e.g., "[Homo sapiens]")
- `rank`: Optional taxonomic rank after `@` (e.g., "species", defaults to `"no rank"` if missing)
- **Robustness Features**
- Trims whitespace around all components.
- Handles multiple `@` symbols (returns error).
- Validates bracket pairing and ordering.
- Ensures `code:taxid` contains exactly one colon separator.
- **Error Handling**
Returns descriptive errors for:
- Missing or malformed brackets
- Invalid number of `@` separators
- Absent colon in code:taxid segment
- Empty fields (code, taxid, or scientific name)
- **Use Cases**
Ideal for parsing legacy biodiversity records (e.g., from OBIS, GBIF), where taxon strings are semi-structured and need reliable extraction before indexing or matching against reference databases.
## Example
Input: `"GBIF:248093 [Homo sapiens]@species"`
Output components:
- `code = "GBIF"`
- `taxid = "248093"`
- `scientificName = "Homo sapiens"`
- `rank = "species"`
Returns empty strings and an error for invalid inputs.