feat: enforce runtime validation for kmer and minimizer parameters

Introduces `CommonArgs::validate()` to enforce strict constraints on `--kmer-size` (odd, 11–31), `--minimizer-size` (odd, 3–k−1), and `z` (strictly less than k). This validation is applied at the entry point of the `superkmer` and `index` commands to prevent invalid configurations, avoid palindromes, prevent u64 overflow, and ensure positive effective indexing sizes. Documentation is updated to reflect these runtime checks and immediate termination on invalid input.
This commit is contained in:
Eric Coissac
2026-05-26 22:55:05 +02:00
parent 82ec6aa1cf
commit 0d9be53d1f
5 changed files with 51 additions and 1 deletions
+12
View File
@@ -27,6 +27,18 @@
- Canonical form: `min(kmer, revcomp(kmer))` reduces strand-symmetric space by half
- Input formats: FASTA, FASTQ, gzip, streaming stdin; `index` reads from stdin automatically when no input files are provided (`-` can also be passed explicitly among other paths)
## Parameter constraints (enforced at CLI)
All constraints below are checked by `CommonArgs::validate()` at the start of `superkmer` and `index`. Invalid values exit immediately with an error.
| Parameter | Constraint | Reason |
|-----------|-----------|--------|
| k (`--kmer-size`) | odd | even k allows palindromic k-mers: kmer == revcomp(kmer), breaking the canonical form invariant |
| k (`--kmer-size`) | k ∈ [11, 31] | k > 31 overflows u64 at 2 bits/base; k < 11 gives insufficient specificity |
| m (`--minimizer-size`) | odd | same palindrome argument as k |
| m (`--minimizer-size`) | 3 ≤ m ≤ k1 | minimizer must be strictly shorter than the kmer |
| z (`-z`, Findere, `index --approx` only) | z ≤ k1 | effective indexed kmer size is kz+1; z ≥ k would make it ≤ 0 |
## Genome label constraints
Genome labels are arbitrary Unicode strings with the following restrictions: