Push ywwwypqxrtmy #14

Merged
coissac merged 6 commits from push-ywwwypqxrtmy into main 2026-06-03 13:18:41 +00:00

6 Commits

Author SHA1 Message Date
Eric Coissac bba5147f0f fix: account for k-mer overlap in total_bases calculation
Introduces a `kmer_overlap` variable (`k-1`) and modifies the `total_bases` accumulation to subtract this overlap from each sequence's length. This ensures the base count accurately reflects only valid k-mer starting positions rather than raw sequence length.
2026-06-03 15:11:48 +02:00
Eric Coissac bfe0cb4b82 feat: integrate obikseq to configure global k-mer and minimizer sizes
This change adds the `obikseq` crate as a local dependency and inserts `set_k` and `set_m` calls across index creation and command modules. By synchronizing the runtime's global k-mer and minimizer dimensions with the loaded index parameters, downstream sequence processing and partitioning operations now consistently use the correct structural constraints.
2026-06-03 14:31:14 +02:00
Eric Coissac 173ac9fb42 feat: introduce packed matrix storage and layer metadata
Unifies bit and integer matrix storage into `PersistentBitMatrix` and `PersistentCompactIntMatrix` enums, supporting both columnar and memory-mapped single-file layouts. Introduces `LayerMeta` to persist layer dimensions as `layer_meta.json`, enabling correct initialization of implicit presence matrices. Adds CLI commands (`pack` and `--upgrade-index`) to convert existing columnar indices to the compact format and backfill missing metadata. Updates partitionner and layered map logic to use the new persistent builders, optimized memory allocation, and auto-detected storage backends.
2026-06-03 14:16:04 +02:00
Eric Coissac de1a41810a perf: enable zero-allocation queries and memory-mapped indexes
Introduce zero-allocation row extraction and query result buffers across `obicompactvec` and `obikpartitionner` to eliminate per-kmer heap allocations. Replace in-memory MPHF deserialization with memory-mapped, zero-copy views to reduce runtime memory footprint. Add configurable I/O chunking, a RAM-aware `--chunk-size` parameter, and system memory monitoring via the new `sysinfo` dependency. Re-export `PreloadedIndex` for external consumers.
2026-06-03 10:24:12 +02:00
Eric Coissac 1661dd6b1c feat: introduce preloaded index cache and thread-safe progress tracker
Introduce `PreloadedIndex` to cache partition indices and eliminate redundant I/O during repeated queries. Refactor the query pipeline to route through this pre-loaded index, and expose it publicly in `obikpartitionner`. Additionally, add a thread-safe, lazily-initialized `MultiProgress` singleton for improved progress tracking.
2026-06-03 09:42:18 +02:00
Eric Coissac 2ebc5f0d75 chore: add logging infrastructure to merge routine
Adds comprehensive logging for source metadata, merge modes, and forced approximation detection. Introduces `format_evidence` and `is_trivial` helpers to format `IndexMode` variants and identify single-genome presence indices. The core merge algorithm remains unmodified, with all changes focused on enhanced runtime observability.
2026-06-01 15:23:37 +02:00