feat: implement partition-based merge command for k-mer indices

Implements a new `merge` command that aggregates k-mer counts and presence/absence matrices from multiple source indices using a parallelized, partition-based algorithm. Adds CLI progress bars and execution timing across the bootstrap, spectrum rebuild, and merge phases. Updates logging to report the aggregate genome count and introduces a bounds check in the perfect hash layer to safely return `None` for unknown k-mers, preventing out-of-bounds access in downstream operations.
This commit is contained in:
Eric Coissac
2026-05-21 11:16:00 +02:00
parent 11182005a2
commit 9e1d6f2f25
4 changed files with 64 additions and 20 deletions
+2
View File
@@ -41,6 +41,8 @@ impl MphfLayer {
#[inline]
pub fn find(&self, kmer: CanonicalKmer) -> Option<usize> {
let slot = self.mphf.index(&kmer.raw());
// PtrHash guarantees slot < n only for its key set; arbitrary queries may exceed bounds.
if slot >= self.n { return None; }
let (chunk_id, rank) = self.evidence.decode(slot);
if self.unitigs.verify_canonical_kmer(chunk_id as usize, rank as usize, kmer) {
Some(slot)