Restructure architecture documentation to reflect the decoupled `MphfLayer` design wrapped by `LayeredStore<S>` and enforce strict multi-genome column invariants. Introduce the approximate index architecture, replacing exact `evidence.bin` with compact `fingerprint.bin` using B-bit fingerprints and z-consecutive k-mer matching. Update CLI flags, add `reindex`/`estimate` workflows, and refactor APIs to support separate exact/approximate evidence handling. Finally, provide a comprehensive on-disk layout specification, including the pipeline state machine, JSON schemas, binary formats, and refined Strategy B unitig evidence details.
These files catalog test coverage for Rust modules across architecture, implementation, and theory sections. They track recent structural changes, flag areas prone to documentation drift, and mandate verification of key parameters and routing logic to maintain alignment with the active codebase.
Bypasses bitwise shift and mask operations when `block_bits == 0`, directly indexing `self.block_offsets[i]` instead. This eliminates unnecessary arithmetic overhead for single-block cases while preserving the original block-based offset calculation for larger block sizes.
Propagate configurable block size (`block_bits`) through index and layer construction to control unitig chunking and optimize memory/performance trade-offs. Introduce an in-place `reindex` command and library method to convert indices between exact and approximate evidence formats. Add validation to reject merging indexes with mismatched evidence types, and update parallel kmer counting to use `AtomicUsize` for thread-safe aggregation. Includes CLI argument parsing, metadata persistence, and updated tests.
Introduces a new `estimate` CLI subcommand to calculate bloom filter size, evidence bits, and false-positive rates for approximate indexing. Updates the index building and querying pipelines to support both exact and approximate evidence types via a unified `EvidenceKind` abstraction. Refactors `MphfLayer` and partition index builders to route operations based on the selected evidence mode, and adds the required `obilayeredmap` dependency.
Introduces `EvidenceKind` and `LayerMeta` structs to manage per-layer evidence configuration and false-positive parameters. Adds JSON serialization for layer metadata persistence and updates `build_approx_evidence` to accept a `z` parameter for consecutive k-mer thresholds. Exposes these types publicly and documents a future `aggregate` command for merging index matrix columns.
Introduce a new `fingerprint` module that stores packed B-bit vectors via memory-mapped files. Expose the module publicly and add `build_approx_evidence` to `Layer` and `MphfLayer` for generating compact `fingerprint.bin` files. Implement `find_approx` for fast, probabilistic kmer lookups using configurable bit-widths. Update dependencies to `bitvec` v1 and add `cacheline-ef`, `epserde`, and `memmap2` to support the new storage and serialization logic.
Decouple index generation by introducing `build_unitig_idx()` for retroactive `.idx` creation and optional immediate writing on close. Add `open_sequential()` for index-less iteration while enforcing index requirements for random access. Refactor the MPHF layer to pre-generate the unitig index for parallel random access, integrate `rayon` for k-mer processing, and enforce mapping integrity via duplicate slot validation. Additionally, implement `build_exact_evidence()` to reconstruct evidence from existing artifacts, and update tests to leverage the new index generation and simplified k-mer iteration helpers.
Replaces the hardcoded BLOCK_SIZE constant with a configurable block_bits parameter, enabling variable index granularity to balance index size and sequential scan cost. Both the reader and writer now store block_bits and a precomputed mask for branchless offset arithmetic, while the index file format is upgraded to UIX3 to persist the configuration. Comprehensive unit tests verify serialization, chunk offset indexing, random access consistency, and kmer count accuracy across various block sizes.
Replace the dense per-chunk offset index with a sparse block-sampled structure (64 chunks per block), reducing the index file size by approximately 300× while preserving O(1) k-mer extraction. Introduce a design document for eliminating the `evidence.bin` file, which accounts for ~66% of the lookup layer, by transitioning to fingerprint-based approximate indexing and value-based MPHF lookups. Update MkDocs navigation to include the new documentation and add a file count tracker to the scatter step progress bar for improved observability.