Push vqmnuyrkpxot #9

Merged
coissac merged 4 commits from push-vqmnuyrkpxot into main 2026-05-26 09:05:51 +00:00

4 Commits

Author SHA1 Message Date
Eric Coissac 1d880fdc5f refactor: optimize MPHF construction and update legacy guidelines
Replaces parallel random-access unitig iteration with a sequential mmap-based iterator for MPHF construction, eliminating the build-time `.idx` dependency by deferring index generation until after persistence. Updates `CLAUDE.md` to treat existing code as a hypothesis, mandating proactive removal of obsolete legacy constructs rather than preserving them out of inertia.
2026-05-26 10:54:59 +02:00
Eric Coissac 009a328c58 refactor: handle kmer deduplication and layer initialization concurrently
Introduce a secondary layer iteration to open `SrcLayerData` alongside the unitig reader for concurrent metadata access. This refactors the merge routine to handle kmer deduplication and per-layer data initialization simultaneously. Also corrects a typo in `rebuild_layer.rs` where `openopen_sequential` is renamed to `open_sequential`.
2026-05-26 10:52:08 +02:00
Eric Coissac 9d46400898 feat: support exact and approximate evidence in layer construction
Refactored `MphfLayer::build` to accept an `EvidenceKind` parameter, routing to exact (index-based, parallel MPHF, writes `evidence.bin`) or approximate (sequential mmap iterator, writes `fingerprint.bin`) pipelines. Introduced `CanonicalKmerIter` for memory-mapped, chunked k-mer iteration with O(1) resets via `Arc<Mmap>`. Updated layer and map APIs to forward evidence kind, added `push_layer` for count matrices, and adjusted tests and public exports accordingly.
2026-05-26 10:23:43 +02:00
Eric Coissac 036d044291 refactor: update core types and add approximate evidence support
Refactor `Kmer`, `SuperKmer`, and chunk reader into optimized, generic representations with compile-time length parameters and bitwise operations. Update the pipeline and scheduler to support batch processing, 1→N flat transformations, and multi-source merging. Introduce an approximate evidence mode using b-bit fingerprints and `.idx` files, alongside existing exact mode. Update CLI documentation, minimizer selection, and query output schema accordingly.
2026-05-26 10:04:25 +02:00