feat(obikmer): add index subcommand for kmer counting pipeline

Introduce the `index` CLI subcommand, implementing a resumable, multi-stage pipeline to partition, dereplicate, and count kmers from input sequences. The command builds a layered de Bruijn graph index per partition, applies optional abundance filtering, and persists unitigs alongside an MPHF-based count matrix. Update `Cargo.toml` and `Cargo.lock` to include new dependencies (`epserde`, `ptr_hash`, `cacheline-ef`, `obicompactvec`, `obilayeredmap`) required for the index builder, and refresh the profiling data files.
This commit is contained in:
Eric Coissac
2026-05-20 14:37:30 +02:00
parent c20a1ed465
commit e66c4d81ef
9 changed files with 231 additions and 1 deletions
+1 -1
View File
@@ -15,7 +15,7 @@ use std::sync::{Arc, Mutex, OnceLock};
pub const MAX_POOL_SIZE: usize = 512;
/// Default pending buffer threshold (bytes) before draining to the physical fd.
pub const DEFAULT_FLUSH_THRESHOLD: usize = 8 * 1024;
pub const DEFAULT_FLUSH_THRESHOLD: usize = 64 * 1024;
// Convenient alias for the per-entry physical writer slot.
type PhysWriter = Option<Box<dyn Write + Send>>;