Push mtzqmmrlmzzx #34
@@ -32,14 +32,20 @@ Multiple values separated by `|` are always OR-ed within the predicate.
|
|||||||
Metadata values can represent hierarchical concept paths such as
|
Metadata values can represent hierarchical concept paths such as
|
||||||
`/Eukaryota/Viridiplantae/Streptophyta/Betulaceae/Betula/nana`.
|
`/Eukaryota/Viridiplantae/Streptophyta/Betulaceae/Betula/nana`.
|
||||||
|
|
||||||
**Both the stored metadata value and the pattern must start with `/`.**
|
Stored taxonomy values always start with `/` (the root of the path).
|
||||||
A pattern that does not start with `/` is rejected at parse time with an error.
|
Query patterns do **not** need to start with `/` — a leading `/` is an optional
|
||||||
|
start anchor, not a requirement.
|
||||||
|
|
||||||
The value matches the pattern if it equals it exactly or starts with the pattern
|
| Pattern form | Semantics |
|
||||||
followed by `/` (segment-boundary prefix):
|
|---|---|
|
||||||
|
| `A/B` | contiguous sub-path A then B, anywhere in the value |
|
||||||
|
| `/A/B` | value starts with A then B |
|
||||||
|
| `A/B$` | value ends with A then B |
|
||||||
|
| `/A/B$` | value is exactly A then B |
|
||||||
|
| `A@x/B` | A with class `x` followed by B with any class |
|
||||||
|
|
||||||
- `taxon~/Betulaceae/Betula` matches `/Betulaceae/Betula/nana` and
|
- `taxon~/Betulaceae/Betula` matches any path that starts with `Betulaceae` then `Betula`.
|
||||||
`/Betulaceae/Betula` but not `/Betulaceae/Betuloides/…`.
|
- `taxon~Betula` matches any path containing `Betula` as a segment, anywhere.
|
||||||
|
|
||||||
### Missing metadata key → NA
|
### Missing metadata key → NA
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,143 @@
|
|||||||
|
# `obitaxonomy` — taxonomy concept paths
|
||||||
|
|
||||||
|
`obitaxonomy` is a dependency-free crate that defines a typed representation
|
||||||
|
of hierarchical concept paths (taxonomic or otherwise) stored in genome metadata.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Concept path syntax
|
||||||
|
|
||||||
|
A concept path is stored as a metadata value with the prefix `taxonomy:/`:
|
||||||
|
|
||||||
|
```
|
||||||
|
taxonomy:/enterobacteriaceae@family/Escherichia@genus/Escherichia coli@species
|
||||||
|
```
|
||||||
|
|
||||||
|
Structure:
|
||||||
|
|
||||||
|
- The `taxonomy:/` prefix is the type discriminator. Any metadata value starting
|
||||||
|
with it is parsed as a `TaxPath`; all others remain plain strings.
|
||||||
|
- The remainder is one or more `/`-separated segments.
|
||||||
|
- Each segment is `name` or `name@rank`, where `rank` is a label for the
|
||||||
|
taxonomic level (e.g. `family`, `genus`, `species`).
|
||||||
|
- Rank annotations are **optional per segment** and can be mixed freely.
|
||||||
|
- Spaces are allowed in both names and ranks.
|
||||||
|
|
||||||
|
### Reserved character
|
||||||
|
|
||||||
|
`@` is reserved throughout the taxonomy system and may **not** appear in:
|
||||||
|
|
||||||
|
| Context | Constraint |
|
||||||
|
|---------|------------|
|
||||||
|
| Segment name | forbidden |
|
||||||
|
| Rank/class label | forbidden |
|
||||||
|
| Metadata key names | forbidden (used as `key@rank` in predicate syntax) |
|
||||||
|
|
||||||
|
`@` is freely allowed in plain-text metadata values (non-taxonomy).
|
||||||
|
|
||||||
|
### Parse errors
|
||||||
|
|
||||||
|
| Condition | Error |
|
||||||
|
|-----------|-------|
|
||||||
|
| Value does not start with `taxonomy:/` | `MissingPrefix` |
|
||||||
|
| No segments after the prefix | `EmptyPath` |
|
||||||
|
| Segment with empty name (consecutive `/`) | `EmptySegmentName` |
|
||||||
|
| Segment with trailing `@` and no rank (`name@`) | `EmptyRankName` |
|
||||||
|
| Segment with more than one `@` | `AmbiguousRank` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Public API
|
||||||
|
|
||||||
|
### `TaxSegment`
|
||||||
|
|
||||||
|
A single node: a name and an optional rank.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
seg.name() // &str
|
||||||
|
seg.rank() // Option<&str>
|
||||||
|
seg.to_string() // "name" or "name@rank"
|
||||||
|
TaxSegment::parse(s) // Result<TaxSegment, TaxError>
|
||||||
|
```
|
||||||
|
|
||||||
|
### `TaxPath`
|
||||||
|
|
||||||
|
```rust
|
||||||
|
TaxPath::parse(s) // Result<TaxPath, TaxError>
|
||||||
|
path.segments() // &[TaxSegment]
|
||||||
|
path.depth() // usize — number of segments
|
||||||
|
path.is_ancestor_of(&other) // bool — prefix match by name, ranks ignored
|
||||||
|
path.name_at_rank("genus") // Option<&str>
|
||||||
|
path.to_string() // reconstructs "taxonomy:/…"
|
||||||
|
```
|
||||||
|
|
||||||
|
`is_ancestor_of` compares segment **names** only — rank annotations are
|
||||||
|
informational and do not affect the ancestry relation.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let a: TaxPath = "taxonomy:/Enterobacteriaceae@family/Escherichia@genus".parse()?;
|
||||||
|
let b: TaxPath = "taxonomy:/Enterobacteriaceae@family/Escherichia@genus/Escherichia coli@species".parse()?;
|
||||||
|
|
||||||
|
assert!(a.is_ancestor_of(&b)); // true
|
||||||
|
assert!(b.is_ancestor_of(&a)); // false
|
||||||
|
assert!(a.is_ancestor_of(&a)); // true (equal ⇒ ancestor)
|
||||||
|
|
||||||
|
assert_eq!(b.name_at_rank("species"), Some("Escherichia coli"));
|
||||||
|
assert_eq!(b.name_at_rank("genus"), Some("Escherichia"));
|
||||||
|
assert_eq!(b.name_at_rank("order"), None);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with `GenomeInfo`
|
||||||
|
|
||||||
|
At index load time, every metadata value is inspected once:
|
||||||
|
|
||||||
|
- Starts with `taxonomy:/` → parsed into `TaxPath`, stored in `genome.taxonomy`.
|
||||||
|
- Otherwise → kept as-is in `genome.meta`.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct GenomeInfo {
|
||||||
|
label: String,
|
||||||
|
meta: HashMap<String, String>, // plain text metadata
|
||||||
|
taxonomy: HashMap<String, TaxPath>, // parsed taxonomy metadata
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The raw string is not duplicated. `TaxPath::to_string()` reconstructs the
|
||||||
|
original value losslessly for serialisation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Predicate operators (in `filter` / `select`)
|
||||||
|
|
||||||
|
Path predicates use the `~` / `!~` operators. The **stored value** always starts
|
||||||
|
with `/` (rooted path); the **query pattern** does not need to.
|
||||||
|
|
||||||
|
### Path pattern syntax
|
||||||
|
|
||||||
|
| Pattern | Semantics |
|
||||||
|
|---------|-----------|
|
||||||
|
| `A/B` | contiguous sub-path A then B, anywhere in the value |
|
||||||
|
| `/A/B` | value starts with A then B (start-anchored) |
|
||||||
|
| `A/B$` | value ends with A then B (end-anchored) |
|
||||||
|
| `/A/B$` | value is exactly A then B (fully anchored) |
|
||||||
|
| `A@x/B` | A with class `x` followed by B with any class |
|
||||||
|
| `A@x/B@y` | A with class `x` followed by B with class `y` |
|
||||||
|
|
||||||
|
A segment pattern without `@` matches the segment name regardless of its stored class.
|
||||||
|
|
||||||
|
### Rank-aware queries
|
||||||
|
|
||||||
|
```
|
||||||
|
key@rank=value
|
||||||
|
```
|
||||||
|
|
||||||
|
| Predicate form | Semantics |
|
||||||
|
|----------------|-----------|
|
||||||
|
| `key@rank=value` | genome's `key` has `value` at rank `rank` |
|
||||||
|
| `key@rank!=value` | does not |
|
||||||
|
| `key@rank=v1\|v2` | value at `rank` is `v1` or `v2` |
|
||||||
|
|
||||||
|
`~` combined with `@rank` on the key (e.g. `key@genus~pattern`) is not defined
|
||||||
|
and is rejected at parse time.
|
||||||
Generated
+1
@@ -1722,6 +1722,7 @@ dependencies = [
|
|||||||
"obiskbuilder",
|
"obiskbuilder",
|
||||||
"obiskio",
|
"obiskio",
|
||||||
"obisys",
|
"obisys",
|
||||||
|
"obitaxonomy",
|
||||||
"pprof",
|
"pprof",
|
||||||
"rayon",
|
"rayon",
|
||||||
"serde_json",
|
"serde_json",
|
||||||
|
|||||||
@@ -19,6 +19,7 @@ obikpartitionner = { path = "../obikpartitionner" }
|
|||||||
obisys = { path = "../obisys" }
|
obisys = { path = "../obisys" }
|
||||||
obiskio = { path = "../obiskio" }
|
obiskio = { path = "../obiskio" }
|
||||||
obikindex = { path = "../obikindex" }
|
obikindex = { path = "../obikindex" }
|
||||||
|
obitaxonomy = { path = "../obitaxonomy" }
|
||||||
obilayeredmap = { path = "../obilayeredmap" }
|
obilayeredmap = { path = "../obilayeredmap" }
|
||||||
clap = { version = "4", features = ["derive"] }
|
clap = { version = "4", features = ["derive"] }
|
||||||
serde_json = "1"
|
serde_json = "1"
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ use std::collections::HashMap;
|
|||||||
use clap::Args;
|
use clap::Args;
|
||||||
use obikindex::GenomeInfo;
|
use obikindex::GenomeInfo;
|
||||||
use obikpartitionner::{GroupQuorumFilter, KmerFilter};
|
use obikpartitionner::{GroupQuorumFilter, KmerFilter};
|
||||||
|
use obitaxonomy::{TaxPath, TaxPattern};
|
||||||
|
|
||||||
// ── Operator ──────────────────────────────────────────────────────────────────
|
// ── Operator ──────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
@@ -49,12 +50,6 @@ impl MetaPred {
|
|||||||
if values.iter().any(|v| v.is_empty()) {
|
if values.iter().any(|v| v.is_empty()) {
|
||||||
return Err(format!("empty value in predicate: {s}"));
|
return Err(format!("empty value in predicate: {s}"));
|
||||||
}
|
}
|
||||||
if matches!(op, PredOp::Matches | PredOp::NotMatches) {
|
|
||||||
if let Some(v) = values.iter().find(|v| !v.starts_with('/')) {
|
|
||||||
return Err(format!("path predicate value must start with '/': {v:?} in predicate: {s}"));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
Ok(Self { key, op, values })
|
Ok(Self { key, op, values })
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -75,14 +70,15 @@ impl MetaPred {
|
|||||||
|
|
||||||
// ── Path matching ─────────────────────────────────────────────────────────────
|
// ── Path matching ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
/// True if `value` is equal to `pattern` or is a descendant of it in a `/`-separated hierarchy.
|
/// True if the stored taxonomy `value` matches `pattern`.
|
||||||
///
|
///
|
||||||
/// Both `value` and `pattern` must start with `/`.
|
/// `value` must be a valid `TaxPath` (starts with `taxonomy:/`).
|
||||||
/// `value` matches if it equals `pattern` exactly or starts with `pattern` followed by `/`.
|
/// `pattern` is a `TaxPattern` query (see `obitaxonomy::TaxPattern` for syntax).
|
||||||
|
/// Returns `false` if either fails to parse.
|
||||||
fn path_matches(value: &str, pattern: &str) -> bool {
|
fn path_matches(value: &str, pattern: &str) -> bool {
|
||||||
value == pattern
|
let Ok(path) = TaxPath::parse(value) else { return false };
|
||||||
|| (value.starts_with(pattern)
|
let Ok(pat) = TaxPattern::parse(pattern) else { return false };
|
||||||
&& value[pattern.len()..].starts_with('/'))
|
pat.matches(&path)
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── Three-value group evaluation ──────────────────────────────────────────────
|
// ── Three-value group evaluation ──────────────────────────────────────────────
|
||||||
|
|||||||
@@ -0,0 +1,6 @@
|
|||||||
|
[package]
|
||||||
|
name = "obitaxonomy"
|
||||||
|
version = "0.1.0"
|
||||||
|
edition = "2024"
|
||||||
|
|
||||||
|
[dependencies]
|
||||||
@@ -0,0 +1,38 @@
|
|||||||
|
use std::fmt;
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
|
pub enum TaxError {
|
||||||
|
/// Stored value does not start with the `taxonomy:/` prefix.
|
||||||
|
MissingPrefix,
|
||||||
|
/// Stored path contains no segments after the prefix.
|
||||||
|
EmptyPath,
|
||||||
|
/// Query pattern contains no segments (after stripping anchors).
|
||||||
|
EmptyPattern,
|
||||||
|
/// A segment has an empty name (e.g. consecutive `/`).
|
||||||
|
EmptySegmentName,
|
||||||
|
/// A segment has a trailing `@` with no rank name.
|
||||||
|
EmptyRankName { segment: String },
|
||||||
|
/// A segment contains more than one `@`.
|
||||||
|
AmbiguousRank { segment: String },
|
||||||
|
}
|
||||||
|
|
||||||
|
impl fmt::Display for TaxError {
|
||||||
|
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||||
|
match self {
|
||||||
|
TaxError::MissingPrefix =>
|
||||||
|
write!(f, "taxonomy path must start with \"taxonomy:/\""),
|
||||||
|
TaxError::EmptyPath =>
|
||||||
|
write!(f, "taxonomy path has no segments"),
|
||||||
|
TaxError::EmptyPattern =>
|
||||||
|
write!(f, "taxonomy query pattern has no segments"),
|
||||||
|
TaxError::EmptySegmentName =>
|
||||||
|
write!(f, "segment has an empty name"),
|
||||||
|
TaxError::EmptyRankName { segment } =>
|
||||||
|
write!(f, "segment has '@' with no rank name: {segment:?}"),
|
||||||
|
TaxError::AmbiguousRank { segment } =>
|
||||||
|
write!(f, "segment contains more than one '@': {segment:?}"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl std::error::Error for TaxError {}
|
||||||
@@ -0,0 +1,11 @@
|
|||||||
|
mod error;
|
||||||
|
mod segment;
|
||||||
|
mod segment_pattern;
|
||||||
|
mod path;
|
||||||
|
mod pattern;
|
||||||
|
|
||||||
|
pub use error::TaxError;
|
||||||
|
pub use segment::TaxSegment;
|
||||||
|
pub use segment_pattern::SegmentPattern;
|
||||||
|
pub use path::{TaxPath, PREFIX};
|
||||||
|
pub use pattern::TaxPattern;
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
use std::fmt;
|
||||||
|
use std::str::FromStr;
|
||||||
|
|
||||||
|
use crate::error::TaxError;
|
||||||
|
use crate::segment::TaxSegment;
|
||||||
|
|
||||||
|
/// The prefix that marks a metadata value as a taxonomy path.
|
||||||
|
pub const PREFIX: &str = "taxonomy:/";
|
||||||
|
|
||||||
|
/// A rooted, `/`-separated taxonomy path with optional per-segment rank annotations.
|
||||||
|
///
|
||||||
|
/// Stored form: `taxonomy:/seg1@rank1/seg2/seg3@rank3`
|
||||||
|
/// The leading `taxonomy:/` is the discriminator; the remainder is one or more
|
||||||
|
/// `/`-separated segments, each of the form `name` or `name@rank`.
|
||||||
|
///
|
||||||
|
/// `@` is reserved and may not appear in segment names or rank names.
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
|
pub struct TaxPath {
|
||||||
|
segments: Vec<TaxSegment>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TaxPath {
|
||||||
|
pub fn parse(s: &str) -> Result<Self, TaxError> {
|
||||||
|
let tail = s.strip_prefix(PREFIX).ok_or(TaxError::MissingPrefix)?;
|
||||||
|
if tail.is_empty() {
|
||||||
|
return Err(TaxError::EmptyPath);
|
||||||
|
}
|
||||||
|
let segments = tail.split('/')
|
||||||
|
.map(TaxSegment::parse)
|
||||||
|
.collect::<Result<Vec<_>, _>>()?;
|
||||||
|
Ok(Self { segments })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// True if `self` is an ancestor of — or equal to — `other`.
|
||||||
|
///
|
||||||
|
/// Comparison is by segment name only; rank annotations are ignored.
|
||||||
|
/// `self` must be a prefix of `other` at segment granularity.
|
||||||
|
pub fn is_ancestor_of(&self, other: &TaxPath) -> bool {
|
||||||
|
self.segments.len() <= other.segments.len()
|
||||||
|
&& self.segments.iter().zip(other.segments.iter())
|
||||||
|
.all(|(a, b)| a.name() == b.name())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns the name of the first segment whose rank equals `rank`, if any.
|
||||||
|
pub fn name_at_rank(&self, rank: &str) -> Option<&str> {
|
||||||
|
self.segments.iter()
|
||||||
|
.find(|s| s.rank() == Some(rank))
|
||||||
|
.map(|s| s.name())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// True if any segment has the given rank.
|
||||||
|
pub fn has_rank(&self, rank: &str) -> bool {
|
||||||
|
self.segments.iter().any(|s| s.rank() == Some(rank))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// True if the path contains a segment with both the given rank and name.
|
||||||
|
pub fn matches_rank(&self, rank: &str, name: &str) -> bool {
|
||||||
|
self.segments.iter().any(|s| s.rank() == Some(rank) && s.name() == name)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn segments(&self) -> &[TaxSegment] { &self.segments }
|
||||||
|
pub fn depth(&self) -> usize { self.segments.len() }
|
||||||
|
pub fn is_empty(&self) -> bool { self.segments.is_empty() }
|
||||||
|
}
|
||||||
|
|
||||||
|
impl fmt::Display for TaxPath {
|
||||||
|
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||||
|
write!(f, "{}", PREFIX)?;
|
||||||
|
let mut first = true;
|
||||||
|
for seg in &self.segments {
|
||||||
|
if !first { write!(f, "/")?; }
|
||||||
|
write!(f, "{seg}")?;
|
||||||
|
first = false;
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl FromStr for TaxPath {
|
||||||
|
type Err = TaxError;
|
||||||
|
fn from_str(s: &str) -> Result<Self, Self::Err> { Self::parse(s) }
|
||||||
|
}
|
||||||
@@ -0,0 +1,72 @@
|
|||||||
|
use crate::error::TaxError;
|
||||||
|
use crate::path::TaxPath;
|
||||||
|
use crate::segment::TaxSegment;
|
||||||
|
use crate::segment_pattern::SegmentPattern;
|
||||||
|
|
||||||
|
/// A query pattern for matching against stored `TaxPath` values.
|
||||||
|
///
|
||||||
|
/// Syntax:
|
||||||
|
///
|
||||||
|
/// | Form | Semantics |
|
||||||
|
/// |----------|-----------|
|
||||||
|
/// | `A/B` | A then B as a contiguous sub-path, anywhere in the value |
|
||||||
|
/// | `/A/B` | value starts with A then B (start-anchored) |
|
||||||
|
/// | `A/B$` | value ends with A then B (end-anchored) |
|
||||||
|
/// | `/A/B$` | value is exactly A then B (fully anchored) |
|
||||||
|
/// | `A@x/B` | A with rank `x`, followed by B with any rank |
|
||||||
|
///
|
||||||
|
/// A segment pattern without `@` matches any segment with that name regardless
|
||||||
|
/// of its stored rank.
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
|
pub struct TaxPattern {
|
||||||
|
start_anchored: bool,
|
||||||
|
end_anchored: bool,
|
||||||
|
segments: Vec<SegmentPattern>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TaxPattern {
|
||||||
|
pub fn parse(s: &str) -> Result<Self, TaxError> {
|
||||||
|
let s = s.trim();
|
||||||
|
|
||||||
|
let start_anchored = s.starts_with('/');
|
||||||
|
let s = if start_anchored { &s[1..] } else { s };
|
||||||
|
|
||||||
|
let end_anchored = s.ends_with('$');
|
||||||
|
let s = if end_anchored { &s[..s.len() - 1] } else { s };
|
||||||
|
|
||||||
|
if s.is_empty() {
|
||||||
|
return Err(TaxError::EmptyPattern);
|
||||||
|
}
|
||||||
|
|
||||||
|
let segments = s.split('/')
|
||||||
|
.map(SegmentPattern::parse)
|
||||||
|
.collect::<Result<Vec<_>, _>>()?;
|
||||||
|
|
||||||
|
Ok(Self { start_anchored, end_anchored, segments })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// True if this pattern matches `path` according to the anchor flags.
|
||||||
|
///
|
||||||
|
/// The pattern must match a contiguous run of segments in the path.
|
||||||
|
/// Start/end anchors restrict where that run may begin or end.
|
||||||
|
pub fn matches(&self, path: &TaxPath) -> bool {
|
||||||
|
let n = self.segments.len();
|
||||||
|
let m = path.depth();
|
||||||
|
|
||||||
|
if n > m { return false; }
|
||||||
|
|
||||||
|
let segs = path.segments();
|
||||||
|
match (self.start_anchored, self.end_anchored) {
|
||||||
|
(true, true) => n == m && self.window_matches(segs, 0),
|
||||||
|
(true, false) => self.window_matches(segs, 0),
|
||||||
|
(false, true) => self.window_matches(segs, m - n),
|
||||||
|
(false, false) => (0..=(m - n)).any(|i| self.window_matches(segs, i)),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn window_matches(&self, segs: &[TaxSegment], start: usize) -> bool {
|
||||||
|
self.segments.iter()
|
||||||
|
.zip(segs[start..start + self.segments.len()].iter())
|
||||||
|
.all(|(pat, seg)| pat.matches(seg))
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,49 @@
|
|||||||
|
use std::fmt;
|
||||||
|
|
||||||
|
use crate::error::TaxError;
|
||||||
|
|
||||||
|
/// A single node in a taxonomy path: a name and an optional rank.
|
||||||
|
///
|
||||||
|
/// Neither `name` nor `rank` may contain `@` (reserved separator).
|
||||||
|
/// Serialised form: `name` or `name@rank`.
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
|
pub struct TaxSegment {
|
||||||
|
name: String,
|
||||||
|
rank: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TaxSegment {
|
||||||
|
pub fn parse(raw: &str) -> Result<Self, TaxError> {
|
||||||
|
let parts: Vec<&str> = raw.splitn(3, '@').collect();
|
||||||
|
|
||||||
|
let (name_raw, rank_raw) = match parts.as_slice() {
|
||||||
|
[name] => (*name, None),
|
||||||
|
[name, rank] => (*name, Some(*rank)),
|
||||||
|
_ => return Err(TaxError::AmbiguousRank { segment: raw.to_string() }),
|
||||||
|
};
|
||||||
|
|
||||||
|
if name_raw.is_empty() {
|
||||||
|
return Err(TaxError::EmptySegmentName);
|
||||||
|
}
|
||||||
|
|
||||||
|
let rank = match rank_raw {
|
||||||
|
None => None,
|
||||||
|
Some("") => return Err(TaxError::EmptyRankName { segment: raw.to_string() }),
|
||||||
|
Some(r) => Some(r.to_string()),
|
||||||
|
};
|
||||||
|
|
||||||
|
Ok(Self { name: name_raw.to_string(), rank })
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn name(&self) -> &str { &self.name }
|
||||||
|
pub fn rank(&self) -> Option<&str> { self.rank.as_deref() }
|
||||||
|
}
|
||||||
|
|
||||||
|
impl fmt::Display for TaxSegment {
|
||||||
|
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||||
|
match &self.rank {
|
||||||
|
None => write!(f, "{}", self.name),
|
||||||
|
Some(r) => write!(f, "{}@{}", self.name, r),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,41 @@
|
|||||||
|
use crate::error::TaxError;
|
||||||
|
use crate::segment::TaxSegment;
|
||||||
|
|
||||||
|
/// A single segment in a query pattern: a required name and an optional rank filter.
|
||||||
|
///
|
||||||
|
/// If `rank` is `None`, the pattern matches any segment with the given name,
|
||||||
|
/// regardless of its stored rank. If `rank` is `Some(r)`, both name and rank
|
||||||
|
/// must match exactly.
|
||||||
|
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||||
|
pub struct SegmentPattern {
|
||||||
|
name: String,
|
||||||
|
rank: Option<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl SegmentPattern {
|
||||||
|
pub fn parse(raw: &str) -> Result<Self, TaxError> {
|
||||||
|
let parts: Vec<&str> = raw.splitn(3, '@').collect();
|
||||||
|
let (name_raw, rank_raw) = match parts.as_slice() {
|
||||||
|
[name] => (*name, None),
|
||||||
|
[name, rank] => (*name, Some(*rank)),
|
||||||
|
_ => return Err(TaxError::AmbiguousRank { segment: raw.to_string() }),
|
||||||
|
};
|
||||||
|
if name_raw.is_empty() {
|
||||||
|
return Err(TaxError::EmptySegmentName);
|
||||||
|
}
|
||||||
|
let rank = match rank_raw {
|
||||||
|
None => None,
|
||||||
|
Some("") => return Err(TaxError::EmptyRankName { segment: raw.to_string() }),
|
||||||
|
Some(r) => Some(r.to_string()),
|
||||||
|
};
|
||||||
|
Ok(Self { name: name_raw.to_string(), rank })
|
||||||
|
}
|
||||||
|
|
||||||
|
/// True if this pattern matches `seg`.
|
||||||
|
/// Name must match exactly. If a rank is specified in the pattern, the
|
||||||
|
/// segment's rank must match; otherwise any rank (or no rank) is accepted.
|
||||||
|
pub fn matches(&self, seg: &TaxSegment) -> bool {
|
||||||
|
self.name == seg.name()
|
||||||
|
&& self.rank.as_deref().map_or(true, |r| seg.rank() == Some(r))
|
||||||
|
}
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user