feat: introduce obitaxonomy crate for hierarchical taxonomy parsing
Adds the `obitaxonomy` crate to parse and validate hierarchical taxonomy paths using a strict `taxonomy:/name@rank/...` syntax. Replaces generic string-based path matching in predicates with structured `TaxPath` and `TaxPattern` types, enforcing explicit anchor constraints and rank-aware semantics. Updates filtering documentation to clarify optional leading slashes and segment-boundary matching rules.
This commit is contained in:
@@ -32,14 +32,20 @@ Multiple values separated by `|` are always OR-ed within the predicate.
|
||||
Metadata values can represent hierarchical concept paths such as
|
||||
`/Eukaryota/Viridiplantae/Streptophyta/Betulaceae/Betula/nana`.
|
||||
|
||||
**Both the stored metadata value and the pattern must start with `/`.**
|
||||
A pattern that does not start with `/` is rejected at parse time with an error.
|
||||
Stored taxonomy values always start with `/` (the root of the path).
|
||||
Query patterns do **not** need to start with `/` — a leading `/` is an optional
|
||||
start anchor, not a requirement.
|
||||
|
||||
The value matches the pattern if it equals it exactly or starts with the pattern
|
||||
followed by `/` (segment-boundary prefix):
|
||||
| Pattern form | Semantics |
|
||||
|---|---|
|
||||
| `A/B` | contiguous sub-path A then B, anywhere in the value |
|
||||
| `/A/B` | value starts with A then B |
|
||||
| `A/B$` | value ends with A then B |
|
||||
| `/A/B$` | value is exactly A then B |
|
||||
| `A@x/B` | A with class `x` followed by B with any class |
|
||||
|
||||
- `taxon~/Betulaceae/Betula` matches `/Betulaceae/Betula/nana` and
|
||||
`/Betulaceae/Betula` but not `/Betulaceae/Betuloides/…`.
|
||||
- `taxon~/Betulaceae/Betula` matches any path that starts with `Betulaceae` then `Betula`.
|
||||
- `taxon~Betula` matches any path containing `Betula` as a segment, anywhere.
|
||||
|
||||
### Missing metadata key → NA
|
||||
|
||||
|
||||
Reference in New Issue
Block a user