A given canonical kmer belongs to **exactly one partition** and **exactly one layer** within that partition. This property makes all aggregation operations decomposable and parallelisable without coordination.
**KmerIndex**: root entry point. Owns `IndexMeta` (written to `index.meta`) and a `KmerPartition` that routes canonical kmers to partition directories. All partition-level operations are dispatched in parallel via rayon.
**Partition directory**: one directory per minimiser bucket. `PartitionMeta` (stored as `meta.json`) records `n_layers`. Layers within a partition cover disjoint kmer sets.
**Layer directory**: one `MphfLayer` plus optional data stores. `LayerMeta` (stored as `layer_meta.json`) records which `EvidenceKind` was used. The MPHF and `unitigs.bin` are immutable once built; evidence files are the only part replaced by `reindex`.
`IndexMeta` is serialised as `index.meta` (JSON). It is the authority for the ordered list of genomes and for the parameters that govern all subsequent operations on the index.
`EvidenceKind` is stored both in `IndexConfig` (index-wide default, updated by `reindex`) and in each `LayerMeta` (per-layer record of what was actually built).
`MphfLayer::find(kmer)` dispatches transparently to `find_exact` or `find_approx` based on the evidence loaded at `open` time (read from `layer_meta.json`). Returns `Some(slot)` only if the kmer is confirmed present; `None` for absent or out-of-range.
`block_bits` controls the `.idx` file written alongside `evidence.bin`. At `block_bits = 0`, every unitig chunk has an index entry, giving O(1) random access; larger values trade access time for a smaller `.idx`.
`Layer::query(kmer)` delegates to `MphfLayer::find`, then calls `data.read(slot)` if a slot is returned. Both exact and approximate evidence are handled transparently; the caller sees only `Option<Hit<D::Item>>`.
This makes `LayeredStore<LayeredStore<PersistentCompactIntMatrix>>` automatically implement `CountPartials` — no separate `PartitionedStore` type is needed:
PersistentCompactIntMatrix::col_weights() — one (partition, layer)
↓ Σ across layers
LayeredStore<PersistentCompactIntMatrix>::col_weights() — one partition
↓ Σ across partitions
LayeredStore<LayeredStore<…>>::col_weights() — global
```
The same cascade applies to every partial method.
---
## Multi-genome column invariant
After any merge, every layer in every partition has exactly `n_genomes` columns, where `n_genomes` is the current total in `index.meta`. This holds for both `PersistentCompactIntMatrix` and `PersistentBitMatrix`.
Maintained by three coordinated operations:
**Existing layers — column append.**`Layer::append_genome_column` appends one column to each existing layer. Slots matching the incoming genome receive its count or `true`; all other slots receive 0 or `false`.
**New layers — absent columns prepended.** When a new layer is created for kmers unique to the incoming genome, `n_existing_genomes` absent columns are prepended before the incoming genome's column, so the new layer immediately has the same column count as all other layers.
**First merge, Presence mode — `init_presence_matrix`.** The initial single-genome index has no `presence/` directory (presence is implicit). On the first merge, `Layer<()>::init_presence_matrix` materialises genome 0's presence column (all `true`) retroactively, raising the column count from 0 to 1 before appending column 1.
This invariant is the precondition for correct progressive aggregation: every level can blindly sum matrices from below because all matrices have the same shape.
On success, `IndexConfig::evidence` and `IndexConfig::block_bits` are updated in `index.meta`. Each layer's `layer_meta.json` is also rewritten with the new `EvidenceKind`.
`estimate` resolves approximate-evidence parameters (`z`, `b`, target FP rate) and prints the resulting effective kmer size and per-kmer / per-z-window false-positive rates without touching any index. Used to calibrate `Approx { b, z }` before building or reindexing.