📖 Update super-kmer theory and implementation to prefer non-degenerate m-mers

- Update super-kmer definition in `kmERS.md` to specify that non-degenerate m-mers are preferred over degenerate ones (degeneracy = homopolymer).
- Refactor `superkmer.rs`: change `.canonical()` to mutate in-place and return bool.
- Add `m` field & canonical-aware minimizer position calculation to SuperKmerIter in obiskbuilder.
- Add helper functions `is_degenerate` and minimizer comparison logic to rolling_stat.rs for consistent tie-breaking.
- Minor formatting cleanup in superkmer command and chunk processing.
This commit is contained in:
Eric Coissac
2026-04-20 17:49:52 +02:00
parent b534c693ac
commit 380b5a6f94
5 changed files with 43 additions and 22 deletions
+7 -5
View File
@@ -286,21 +286,23 @@ impl SuperKmer {
Ok(self.kmer(i, k)?.canonical(k))
}
/// Return this super-kmer in canonical form (lexicographic minimum of forward and revcomp).
pub fn canonical(mut self) -> Self {
/// Put this super-kmer in canonical form (lexicographic minimum of forward and revcomp).
///
/// Returns `true` if already canonical (no change), `false` if revcomp was applied.
pub fn canonical(&mut self) -> bool {
let seql = self.seql();
for i in 0..seql {
let fwd = self.nucleotide(i);
let rev = complement(self.nucleotide(seql - 1 - i));
if fwd < rev {
return self;
return true;
}
if fwd > rev {
self.revcomp();
return self;
return false;
}
}
self
true
}
}