`PersistentBitVec` stores a dense bit vector (presence/absence per slot) backed by a single mmap'd file. It is the binary counterpart of `PersistentCompactIntVec` and shares the same lifecycle pattern (builder → close → reader). All bulk operations work on u64 words rather than bytes, giving 8× fewer iterations and enabling the compiler to emit POPCNT and SIMD instructions.
Typical use: converting k-mer count vectors to presence/absence vectors (with optional threshold), then computing set-theoretic distances (Jaccard) or edit distances (Hamming) between samples.
`PersistentBitMatrix` wraps multiple `PersistentBitVec` columns in a directory, exposing a column-major binary matrix with row-access API. A single-column bit matrix is a vector at the API level.
data: [u64; ⌈n/64⌉] bit words, LSB-first, zero-padded
```
**Header is 16 bytes**, so data starts at an offset divisible by 8. Since `mmap` returns page-aligned memory (≥ 4096-byte aligned), the data slice is u64-aligned, enabling a zero-copy `&[u8] → &[u64]` reinterpretation.
**Bit layout**: bit `i` is in `data[i >> 6]` at bit position `i & 63` (LSB-first). Bits `[n, ⌈n/64⌉×64)` are **always zero** (padding). This invariant is maintained by all write operations and must be restored by `not()` after flipping.
The file and mmap are created immediately at construction. The header is written once at `new()` or copied from the source at `build_from*()`. `close()` is a single flush — there is no tail to append, unlike `PersistentCompactIntVec`.
All operate on `⌈n/64⌉` u64 words. O(n/64) per call.
```rust
builder.and(&other);// self[i] &= other[i] for all i
builder.or(&other);// self[i] |= other[i]
builder.xor(&other);// self[i] ^= other[i]
builder.not();// self[i] = !self[i], then re-zero padding bits
```
`and`/`or`/`xor` read `other`'s word slice directly (no allocation). `not()` flips all words then masks the last word's padding bits to restore the invariant.
Writing `not()` without masking the last word would corrupt `count_ones()`, `hamming_dist()`, and `jaccard_dist()`. The mask applied after flipping is `(1u64 << (n % 64)) - 1` (no-op if `n % 64 == 0`). All other operations (`and`, `or`, `xor`) preserve existing zero padding since they can only clear or preserve bits already set by `not()`.
A directory containing `meta.json` and N column files `col_000000.pbiv`, `col_000001.pbiv`, …, each a `PersistentBitVec`. Used for presence/absence matrices: one column per genome, one bit per MPHF slot.
```
presence/
meta.json {"n": <n_slots>, "n_cols": <G>}
col_000000.pbiv genome 0
col_000001.pbiv genome 1
...
```
Column-major layout makes per-genome set operations (Jaccard, Hamming, AND/OR) cache-friendly — each genome is a contiguous file. Row access (which genomes contain a given kmer) requires one O(1) read per column.
Creates `col_NNNNNN.pbiv` for the next column and returns its builder. The caller fills the column and calls `builder.close()` before calling `add_col` again.
**`close(self) -> io::Result<()>`**
Writes `meta.json` with the final `n` and `n_cols`.
### Reader (`PersistentBitMatrix`)
```rust
structPersistentBitMatrix{
cols: Vec<PersistentBitVec>,
n: usize,
}
```
**`open(dir: &Path) -> io::Result<Self>`**
Reads `meta.json`, opens all `col_NNNNNN.pbiv` files.
**`row(slot: usize) -> Box<[bool]>`**
Returns the presence vector: `[col_0[slot], col_1[slot], …, col_{G-1}[slot]]`. One byte read per column. O(G).
**`col(c: usize) -> &PersistentBitVec`**
Direct access to a single column for column-oriented operations.
`partial_jaccard` returns `(inter, union)` as a pair because `union` is not reconstructible from per-column `count_ones()` — it depends on both columns simultaneously. Both components are additively decomposable across `(partition, layer)` pairs; the final `jaccard_dist_matrix()` is computed from their element-wise sums.