feat: add benchmark pipeline, expose APIs, and enforce strict paths

Introduces a Make-based orchestration for simulating, indexing, merging, filtering, and verifying k-mer counts and presence. Exposes internal builder and iterator APIs publicly, enforces mandatory leading slashes for predicate patterns, registers the `obitaxonomy` crate, and updates tooling configurations alongside documentation.
This commit is contained in:
Eric Coissac
2026-06-19 09:55:41 +02:00
parent 280ca1f5a3
commit c694e1f2b0
42 changed files with 2585 additions and 84 deletions
+8 -7
View File
@@ -29,16 +29,17 @@ Multiple values separated by `|` are always OR-ed within the predicate.
### Path matching (`~` and `!~`)
Metadata values can represent hierarchical taxonomic paths such as
Metadata values can represent hierarchical concept paths such as
`/Eukaryota/Viridiplantae/Streptophyta/Betulaceae/Betula/nana`.
- **Absolute pattern** (starts with `/`): the value must start with the pattern
at a segment boundary.
`taxon~/Betulaceae/Betula` matches `/Betulaceae/Betula/nana` and
**Both the stored metadata value and the pattern must start with `/`.**
A pattern that does not start with `/` is rejected at parse time with an error.
The value matches the pattern if it equals it exactly or starts with the pattern
followed by `/` (segment-boundary prefix):
- `taxon~/Betulaceae/Betula` matches `/Betulaceae/Betula/nana` and
`/Betulaceae/Betula` but not `/Betulaceae/Betuloides/…`.
- **Bare segment** (no leading `/`): the value must contain the pattern as an
exact path component anywhere.
`taxon~Betula` matches any path that has `Betula` as one of its segments.
### Missing metadata key → NA