feat: introduce obitaxonomy crate for hierarchical taxonomy parsing

Adds the `obitaxonomy` crate to parse and validate hierarchical taxonomy paths using a strict `taxonomy:/name@rank/...` syntax. Replaces generic string-based path matching in predicates with structured `TaxPath` and `TaxPattern` types, enforcing explicit anchor constraints and rank-aware semantics. Updates filtering documentation to clarify optional leading slashes and segment-boundary matching rules.
This commit is contained in:
Eric Coissac
2026-06-21 10:37:50 +02:00
parent c694e1f2b0
commit 9356be4ec0
12 changed files with 464 additions and 18 deletions
+12 -6
View File
@@ -32,14 +32,20 @@ Multiple values separated by `|` are always OR-ed within the predicate.
Metadata values can represent hierarchical concept paths such as
`/Eukaryota/Viridiplantae/Streptophyta/Betulaceae/Betula/nana`.
**Both the stored metadata value and the pattern must start with `/`.**
A pattern that does not start with `/` is rejected at parse time with an error.
Stored taxonomy values always start with `/` (the root of the path).
Query patterns do **not** need to start with `/` — a leading `/` is an optional
start anchor, not a requirement.
The value matches the pattern if it equals it exactly or starts with the pattern
followed by `/` (segment-boundary prefix):
| Pattern form | Semantics |
|---|---|
| `A/B` | contiguous sub-path A then B, anywhere in the value |
| `/A/B` | value starts with A then B |
| `A/B$` | value ends with A then B |
| `/A/B$` | value is exactly A then B |
| `A@x/B` | A with class `x` followed by B with any class |
- `taxon~/Betulaceae/Betula` matches `/Betulaceae/Betula/nana` and
`/Betulaceae/Betula` but not `/Betulaceae/Betuloides/…`.
- `taxon~/Betulaceae/Betula` matches any path that starts with `Betulaceae` then `Betula`.
- `taxon~Betula` matches any path containing `Betula` as a segment, anywhere.
### Missing metadata key → NA