mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,41 @@
|
||||
# `obitax` Package: Taxon String Parser
|
||||
|
||||
The `obitax` package provides a robust parser for structured taxonomic strings used in biodiversity data processing.
|
||||
|
||||
## Core Functionality
|
||||
|
||||
- **`ParseTaxonString(taxonStr string)`**
|
||||
Parses strings in the format: `code:taxid [scientific name]@rank`.
|
||||
|
||||
- **Input Format Requirements**
|
||||
- `code`: Taxonomy identifier (e.g., "GBIF", "NCBI")
|
||||
- `taxid`: Numeric or alphanumeric taxonomic ID (e.g., "123456")
|
||||
- `scientific name`: Enclosed in square brackets (e.g., "[Homo sapiens]")
|
||||
- `rank`: Optional taxonomic rank after `@` (e.g., "species", defaults to `"no rank"` if missing)
|
||||
|
||||
- **Robustness Features**
|
||||
- Trims whitespace around all components.
|
||||
- Handles multiple `@` symbols (returns error).
|
||||
- Validates bracket pairing and ordering.
|
||||
- Ensures `code:taxid` contains exactly one colon separator.
|
||||
|
||||
- **Error Handling**
|
||||
Returns descriptive errors for:
|
||||
- Missing or malformed brackets
|
||||
- Invalid number of `@` separators
|
||||
- Absent colon in code:taxid segment
|
||||
- Empty fields (code, taxid, or scientific name)
|
||||
|
||||
- **Use Cases**
|
||||
Ideal for parsing legacy biodiversity records (e.g., from OBIS, GBIF), where taxon strings are semi-structured and need reliable extraction before indexing or matching against reference databases.
|
||||
|
||||
## Example
|
||||
|
||||
Input: `"GBIF:248093 [Homo sapiens]@species"`
|
||||
Output components:
|
||||
- `code = "GBIF"`
|
||||
- `taxid = "248093"`
|
||||
- `scientificName = "Homo sapiens"`
|
||||
- `rank = "species"`
|
||||
|
||||
Returns empty strings and an error for invalid inputs.
|
||||
Reference in New Issue
Block a user