Files
obitools4/autodoc/docmd/pkg/obitax/string_parser.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.5 KiB

obitax Package: Taxon String Parser

The obitax package provides a robust parser for structured taxonomic strings used in biodiversity data processing.

Core Functionality

  • ParseTaxonString(taxonStr string)
    Parses strings in the format: code:taxid [scientific name]@rank.

  • Input Format Requirements

    • code: Taxonomy identifier (e.g., "GBIF", "NCBI")
    • taxid: Numeric or alphanumeric taxonomic ID (e.g., "123456")
    • scientific name: Enclosed in square brackets (e.g., "[Homo sapiens]")
    • rank: Optional taxonomic rank after @ (e.g., "species", defaults to "no rank" if missing)
  • Robustness Features

    • Trims whitespace around all components.
    • Handles multiple @ symbols (returns error).
    • Validates bracket pairing and ordering.
    • Ensures code:taxid contains exactly one colon separator.
  • Error Handling
    Returns descriptive errors for:

    • Missing or malformed brackets
    • Invalid number of @ separators
    • Absent colon in code:taxid segment
    • Empty fields (code, taxid, or scientific name)
  • Use Cases
    Ideal for parsing legacy biodiversity records (e.g., from OBIS, GBIF), where taxon strings are semi-structured and need reliable extraction before indexing or matching against reference databases.

Example

Input: "GBIF:248093 [Homo sapiens]@species"
Output components:

  • code = "GBIF"
  • taxid = "248093"
  • scientificName = "Homo sapiens"
  • rank = "species"

Returns empty strings and an error for invalid inputs.