docmd/implementation/obicompactvec.md

# obicompactvec — Complete Reference

## Module structure

```
src/obicompactvec/src/
  lib.rs            public re-exports
  views.rs          BitSliceView<'a>, IntSliceView<'a> — zero-copy read views
  traits.rs         ColumnWeights, CountPartials, BitPartials (matrix aggregation)
  bitvec.rs         PersistentBitVec, PersistentBitVecBuilder, BitIter
  reader.rs         PersistentCompactIntVec (read-only)
  builder.rs        PersistentCompactIntVecBuilder (read-write)
  tempintvec.rs     TempCompactIntVec, TempCompactIntVecBuilder (temp-file-backed)
  tempbitvec.rs     TempBitVec, TempBitVecBuilder (temp-file-backed)
  bitmatrix.rs      PersistentBitMatrix, PersistentBitMatrixBuilder
  intmatrix.rs      PersistentCompactIntMatrix, PersistentCompactIntMatrixBuilder
  colgroup.rs       ColGroup, MatrixGroupOps trait
  format.rs         file format constants, encode/decode helpers
  layer_meta.rs     LayerMeta (column metadata)
  meta.rs           matrix metadata
```

```mermaid
graph TD
    views --> bitvec
    views --> builder
    views --> tempbitvec
    views --> tempintvec
    views --> bitmatrix
    views --> intmatrix
    format --> reader
    format --> builder
    reader --> intmatrix
    reader --> tempintvec
    builder --> intmatrix
    builder --> tempintvec
    bitvec --> tempbitvec
    bitvec --> bitmatrix
    tempintvec --> intmatrix
    tempintvec --> bitmatrix
    tempbitvec --> intmatrix
    tempbitvec --> bitmatrix
    colgroup --> intmatrix
    colgroup --> bitmatrix
    layer_meta --> bitmatrix
    layer_meta --> intmatrix
    meta --> bitmatrix
    meta --> intmatrix
```

---

## Compact int encoding

All integer vectors use the same two-tier encoding regardless of storage backend.

**Primary array** — one `u8` per slot:

- Values **0–254** are stored directly. No overhead.
- Value **255 is a sentinel**: the slot's actual value is ≥ 255 and lives in the overflow store.

**Overflow store** — maps slot index to a `u32` value ≥ 255:

- In `PersistentCompactIntVecBuilder`: a `HashMap<usize, u32>` in RAM.
- In `PersistentCompactIntVec` (reader): a sorted `[(slot: u64, value: u32)]` array in the mmap, with a sparse L1-resident index for binary search.

```mermaid
flowchart LR
    slot --> P["primary[slot]: u8"]
    P -->|"< 255"| V["value = byte (0–254)"]
    P -->|"= 255 sentinel"| OV["overflow store"]
    OV -->|"Builder"| HM["HashMap&lt;usize, u32&gt;\nin RAM"]
    OV -->|"PersistentCompactIntVec"| SA["sorted [(slot,value)] in mmap\n+ sparse L1 index"]
```

**Key property — sentinel 255 = +∞ on `u8`:**

- `min(a, 255) = a` for all `a ≤ 254` → correct when only one side is overflow
- `max(a, 255) = 255` → correct sentinel when either side is overflow
- Only the **both-overflow** case requires reading actual values from the overflow store.

In practice, k (overflow count) ≪ n (total slots). Observed genomic data: ~0.07% of kmer slots are in overflow.

---

## View types

The previous trait hierarchy (`BitSlice`, `BitSliceMut`, `IntSlice`, `IntSliceMut`) has been replaced by two concrete zero-copy view structs with inherent methods. Views are **`Copy`** — passing them is free. All read operations live on these two types.

### `BitSliceView<'a>`

```rust
#[derive(Clone, Copy)]
pub struct BitSliceView<'a> { pub(crate) words: &'a [u64], pub(crate) n: usize }
```

Bit `i` is at `words[i >> 6]` bit `i & 63` (LSB-first). Padding bits in the last word are zero.

| Method | Cost |
|---|---|
| `len()`, `is_empty()` | O(1) |
| `get(slot)` | O(1) |
| `count_ones()` | POPCNT per word, O(n/64) |
| `count_zeros()` | `n − count_ones()`, O(n/64) |
| `iter() -> BitSliceIter<'a>` | O(1) setup, O(n) iteration |
| `partial_jaccard_dist(other: BitSliceView)` | `(a&b).popcount`, `(a\|b).popcount` per word, O(n/64) |
| `jaccard_dist(other: BitSliceView)` | from partial, O(n/64) |
| `hamming_dist(other: BitSliceView)` | `(a^b).popcount` per word, O(n/64) |

`BitSliceIter<'a>`: word-level scan; one word per 64 iterations.

### `IntSliceView<'a>`

```rust
#[derive(Clone, Copy)]
pub struct IntSliceView<'a> {
    pub(crate) primary:      &'a [u8],
    pub(crate) overflow_raw: &'a [u8],   // sorted [(slot:u64, value:u32)] entries
    pub(crate) n_overflow:   usize,
    pub(crate) n:            usize,
}
```

`overflow_raw` contains `n_overflow` entries of `OVERFLOW_ENTRY_SIZE` bytes each, sorted by slot. The sort invariant is established at `close()`/`freeze()` time.

| Method | Cost |
|---|---|
| `len()`, `is_empty()` | O(1) |
| `primary_bytes()` | O(1) |
| `overflow_entries() -> impl Iterator<(usize,u32)>` | O(n_overflow) iteration |
| `get(slot)` | O(1) primary; binary search O(log k) for overflow slots |
| `iter() -> IntSliceViewIter<'a>` | merge scan, O(n + k) |
| `sum()` | byte scan + overflow, O(n + k) |
| `count_nonzero()` | byte scan, O(n) |
| Distance methods (`bray_dist`, `euclidean_dist`, `jaccard_dist`, …) | O(n + k) |

`IntSliceViewIter<'a>`: merge scan using `overflow_pos` index. Requires sorted overflow — guaranteed by the construction lifecycle.

**Builder `view()` vs reader `view()`:** `PersistentCompactIntVecBuilder` stores overflow as an unsorted `HashMap`, not raw bytes. Its `view()` returns an `IntSliceView` with `overflow_raw = &[]` and `n_overflow = 0`. This is intentional — the view is primarily useful after `freeze()`. During building, callers that need overflow use `overflow_entries()` directly.

---

## Concrete types

```mermaid
classDiagram
    class BitSliceView {
        +words: &[u64]
        +n: usize
        +get(slot) bool
        +count_ones() u64
        +iter() BitSliceIter
        +jaccard_dist/hamming_dist(other: BitSliceView)
    }
    class IntSliceView {
        +primary: &[u8]
        +overflow_raw: &[u8]
        +n_overflow: usize
        +n: usize
        +get(slot) u32
        +iter() IntSliceViewIter
        +overflow_entries() Iterator
        +bray_dist/euclidean_dist/…(other: IntSliceView)
    }
    class PersistentBitVec {
        -mmap: Mmap
        -n: usize
        +view() BitSliceView
        +get(slot) bool
        +count_ones/zeros() u64
        +iter() BitIter
        +partial_jaccard_dist(&Self) (u64,u64)
        +jaccard_dist/hamming_dist(&Self) …
    }
    class PersistentBitVecBuilder {
        -mmap: MmapMut
        -n: usize
        +view() BitSliceView
        +set(slot, bool)
        +or/and/xor/not(BitSliceView)
        +copy_from(BitSliceView)
        +close() / finish() → PersistentBitVec
    }
    class PersistentCompactIntVec {
        -mmap: Mmap
        -n: usize
        -n_overflow: usize
        -step: usize
        -index: Vec~(usize,usize)~
        +view() IntSliceView
        +get(slot) u32
        +iter() Iter
        +sum/count_nonzero() u64
        +bray_dist/euclidean_dist/… (&Self)
    }
    class PersistentCompactIntVecBuilder {
        -mmap: MmapMut
        -n: usize
        -overflow: HashMap~usize,u32~
        +view() IntSliceView
        +set(slot, u32) / get(slot) u32
        +inc / inc_present / inc_present_fast
        +inc_predicate / inc_predicate_fast
        +add/min/max/diff/mask_with(…View)
        +primary_bytes/primary_bytes_mut()
        +close() / finish() → PersistentCompactIntVec
    }

    PersistentBitVec --> BitSliceView : view()
    PersistentBitVecBuilder --> BitSliceView : view()
    PersistentCompactIntVec --> IntSliceView : view()
    PersistentCompactIntVecBuilder --> IntSliceView : view() (primary only)
    PersistentBitVecBuilder --> PersistentBitVec : close() then open()
    PersistentCompactIntVecBuilder --> PersistentCompactIntVec : close() then open()
```

### `PersistentBitVec` / `PersistentBitVecBuilder`

`PersistentBitVec` is the read-only type. `view()` returns a `BitSliceView<'_>` over the mmap word array. Direct inherent methods delegate to the view: `count_ones()`, `count_zeros()`, `partial_jaccard_dist(&Self)`, `jaccard_dist(&Self)`, `hamming_dist(&Self)`.

`BitIter<'a>` — exported iterator for `PersistentBitVec::iter()`:

```rust
pub struct BitIter<'a> { pub(crate) words: &'a [u64], pub(crate) slot: usize, pub(crate) n: usize }
```

`PersistentBitVecBuilder` is the read-write type. Mutation operations accept `BitSliceView<'_>`:

| Method | Cost |
|---|---|
| `set(slot, bool)` | O(1) |
| `view() -> BitSliceView<'_>` | O(1) |
| `or/and/xor(BitSliceView)` | word-level, O(n/64), SIMD-friendly |
| `not()` | `w ^= u64::MAX` per word, re-masks last word | O(n/64) |
| `copy_from(BitSliceView)` | `copy_from_slice` | O(n/64) |

### `PersistentCompactIntVec` / `PersistentCompactIntVecBuilder`

`PersistentCompactIntVec` is the read-only type. `view()` returns an `IntSliceView<'_>` over the mmap primary and overflow arrays. Inherent `iter()` is a merge scan (`Iter` struct). Inherent `sum()` and `count_nonzero()` use fast byte-scan helpers.

`PersistentCompactIntVecBuilder` is the read-write type. Mutation methods on the builder fall into two categories:

**Point mutations:**

| Method | Note |
|---|---|
| `set(slot, u32)` | writes primary[slot] or 255+overflow |
| `get(slot) -> u32` | reads primary byte or HashMap |
| `inc(slot)` | `get` + `set`, O(1) |

**Bulk computation methods** — accept view arguments:

| Method | Semantics | Overflow |
|---|---|---|
| `inc_present(BitSliceView)` | `+= 1` at each 1-bit | via `inc`, safe for any group size |
| `inc_present_fast(BitSliceView)` | same, raw u8 `+= 1` | `debug_assert` no 255 reached |
| `inc_predicate(IntSliceView, pred)` | `+= 1` where `pred(col[s])` | two-pass, safe |
| `inc_predicate_fast(IntSliceView, pred)` | same, raw u8 | `debug_assert` no 255 reached |
| `add(IntSliceView)` | `self[s] += other[s]` | primary fast path + overflow fallback |
| `min(IntSliceView)` | byte min + both-overflow fixup | see algorithm below |
| `max(IntSliceView)` | pre-pass + byte max | see algorithm below |
| `diff(IntSliceView)` | saturating sub | self<255 hot path |
| `mask_with(BitSliceView)` | zeros slots where mask bit = 0 | O(n_zeros) |

**`inc_present_fast` / `inc_predicate_fast` invariant:** caller guarantees no counter reaches 255 during the operation (group size < 255 for `inc_present_fast`, or chunk size < 255 for `inc_predicate_fast`). Violation is caught by `debug_assert` in dev builds.

**`min` algorithm:**

Exploits 255 = +∞: byte-level min is correct unless both sides are overflow.

```
snapshot self_ov: Vec<(slot,val)>
snapshot other_ov: HashMap<slot,val>
clear_overflow()
Pass 1 — byte min, SIMD-vectorizable, O(n)
Pass 2 — both-overflow fixup, O(k_self):
  for (slot, self_val) in self_ov:
    if slot ∈ other_ov: set(slot, min(self_val, other_ov[slot]))
```

**`max` algorithm:**

Cannot do byte max first — `max(255, b<255)=255` overwrites self's original overflow value. Pre-pass reads self's value at other's overflow slots before the byte pass.

```
Pre-pass O(k_other): for (slot, other_val) in other.overflow_entries():
  set(slot, max(self.get(slot), other_val))
Pass 1 — byte max, SIMD-vectorizable, O(n)
```

---

## Matrix types

Four matrix types, two encodings × two formats:

| | Columnar format | Packed format |
|---|---|---|
| **Bit** | `PersistentBitMatrix` (Columnar variant) | `PersistentBitMatrix` (Packed variant) |
| **Int** | `PersistentCompactIntMatrix` (Columnar variant) | `PersistentCompactIntMatrix` (Packed variant) |

Both matrix types are enums (`Columnar` / `Packed` / `Implicit` for bit) behind a transparent API. `col_view(c)` returns the appropriate view directly:

```rust
// PersistentBitMatrix
pub fn col_view(&self, c: usize) -> BitSliceView<'_>

// PersistentCompactIntMatrix
pub fn col_view(&self, c: usize) -> IntSliceView<'_>
```

No wrapper enums (`BitColView`, `IntColView`): the caller receives a `Copy` view struct immediately usable with any view method or bulk builder method.

`pack_compact_int_matrix` and `pack_bit_matrix` convert columnar → packed format.

---

## Aggregation traits (matrix level)

### ColumnWeights

```rust
trait ColumnWeights: Send + Sync {
    fn col_weights(&self) -> Array1<u64>;         // sum per column
    fn partial_kmer_counts(&self) -> Array1<u64>; // default = col_weights()
}
```

`partial_kmer_counts` is overridden for count matrices to return `count_nonzero` per column (distinct kmers) rather than total count.

### CountPartials

Abstract required methods: `partial_bray`, `partial_euclidean`, `partial_threshold_jaccard`, `partial_relfreq_bray`, `partial_relfreq_euclidean`, `partial_hellinger`.

**Additivity rule:** self-contained partials (`partial_bray`, `partial_euclidean`, `partial_threshold_jaccard`) can be element-wise summed across all `(partition, layer)` pairs. Normalised partials (`partial_relfreq_*`, `partial_hellinger`) require the **global** `col_weights` (accumulated across all layers and all partitions) as parameter.

**`partial_threshold_jaccard` returns `(inter, union)`** because `union[i,j]` depends on both columns simultaneously.

Provided finalisations:

| Finalisation | Formula |
|---|---|
| `bray_dist_matrix()` | `1 − 2·partial_bray[i,j] / (w[i] + w[j])` |
| `euclidean_dist_matrix()` | `√partial_euclidean[i,j]` |
| `threshold_jaccard_dist_matrix(t)` | `1 − inter[i,j] / union[i,j]` |
| `relfreq_bray_dist_matrix()` | `1 − partial_relfreq_bray[i,j]` |
| `relfreq_euclidean_dist_matrix()` | `√partial_relfreq_euclidean[i,j]` |
| `hellinger_dist_matrix()` | `√partial_hellinger[i,j] / √2` |
| `hellinger_euclidean_dist_matrix()` | `√partial_hellinger[i,j]` |

### BitPartials

Required: `partial_jaccard() -> (Array2<u64>, Array2<u64>)`, `partial_hamming() -> Array2<u64>`. Both additive across layers and partitions.

---

## Temp-file-backed types

**All inter-function results use temp-file-backed types** so the OS can page them out under memory pressure. This matters in practice: processing dozens of layers × hundreds of partitions in parallel would otherwise accumulate gigabytes of live anonymous memory.

### Lifecycle

```
TempCompactIntVecBuilder::new(n)   →  writable mmap in TempDir
     ↓  (inc_present_fast / inc_predicate_fast / add / mask_with / …)
 .freeze()                          →  TempCompactIntVec  (read-only mmap + TempDir)
     ↓  (optional)
 .make_persistent(path)             →  PersistentCompactIntVec  (permanent file)
```

Same pattern for `TempBitVecBuilder` → `TempBitVec` → `PersistentBitVec`.

**Drop order**: `TempCompactIntVec { vec: PersistentCompactIntVec, _temp: TempDir }` — Rust drops fields in declaration order. `vec` (mmap) released before `_temp` (directory deleted). No explicit `drop()` needed.

### TempCompactIntVec / TempCompactIntVecBuilder

```rust
pub struct TempCompactIntVec {
    vec:   PersistentCompactIntVec,
    _temp: TempDir,        // dropped after vec
}

pub(crate) struct TempCompactIntVecBuilder {
    builder: PersistentCompactIntVecBuilder,
    temp:    TempDir,
}
```

`TempCompactIntVec`: read access via `get(slot)`, `sum()`, `iter()`, `view() -> IntSliceView<'_>`.

`TempCompactIntVecBuilder`: full delegation to inner `PersistentCompactIntVecBuilder` — all bulk computation methods (`inc_present_fast`, `inc_predicate_fast`, `add`, `min`, `max`, `diff`, `mask_with`) are exposed as `pub(crate)`.

### TempBitVec / TempBitVecBuilder

```rust
pub struct TempBitVec {
    vec:   PersistentBitVec,
    _temp: TempDir,
}

pub(crate) struct TempBitVecBuilder {
    builder: PersistentBitVecBuilder,
    temp:    TempDir,
}
```

`TempBitVec`: read access via `get(slot)`, `count_ones()`, `view() -> BitSliceView<'_>`, `iter()`.

`TempBitVecBuilder`: exposes `set(slot, bool)`, `or(BitSliceView)`, and:

```rust
pub(crate) fn or_where(&mut self, col: IntSliceView<'_>, pred: impl Fn(u32) -> bool)
```

`or_where` — two passes, no intermediate allocation:

```
Pass 1 — primary bytes, O(n):
  for slot in 0..n:
    b = col.primary_bytes()[slot]
    if b < 255 AND pred(b as u32): self.set(slot, true)

Pass 2 — overflow, O(k):
  for (slot, val) in col.overflow_entries():
    if pred(val): self.set(slot, true)
```

---

## Filter / Select API

### ColGroup

```rust
pub struct ColGroup { pub name: String, pub indices: Vec<usize> }
```

Defined **once at the index level** from column metadata. Valid in all matrices of all layers and partitions — column structure is identical across the entire hierarchy; only rows (kmer slots) are partitioned.

### Composition axis

- **Across partitions**: kmer space is partitioned → partial results **concatenated** (disjoint kmer ranges).
- **Across layers**: same kmer space, different counts → partial results **aggregated** (add, OR, etc.).

### MatrixGroupOps

Five required primitives + two default methods derived from them. All return temp-file-backed types.

```rust
pub trait MatrixGroupOps {
    // required
    fn partial_group_presence_count(&self, g: &ColGroup, threshold: u32)
        -> io::Result<TempCompactIntVec>;
    fn partial_group_sum(&self, g: &ColGroup)
        -> io::Result<TempCompactIntVec>;
    fn partial_group_any(&self, g: &ColGroup, threshold: u32)
        -> io::Result<TempBitVec>;
    fn partial_group_min(&self, g: &ColGroup)
        -> io::Result<TempCompactIntVec>;
    fn partial_group_max(&self, g: &ColGroup)
        -> io::Result<TempCompactIntVec>;

    // defaults derived from partial_group_presence_count
    fn partial_group_all(&self, g: &ColGroup, threshold: u32)
        -> io::Result<TempBitVec>;   // slot=1 iff count == g.indices.len()
    fn partial_group_none(&self, g: &ColGroup, threshold: u32)
        -> io::Result<TempBitVec>;   // slot=1 iff count == 0
}
```

Implemented for both `PersistentCompactIntMatrix` and `PersistentBitMatrix`.

For **bit matrices**: values are 0/1, so `partial_group_sum` = `partial_group_presence_count(g, 1)`; `partial_group_min` is AND (set first column then mask-with remaining); `partial_group_max` is OR via `partial_group_any` + `inc_present`.

**`partial_group_presence_count` — chunking for large groups:**

When `g.indices.len() < 255`: per-slot counts stay within `u8` range. Use `inc_present_fast` (bit) or `inc_predicate_fast(col_view(c), |v| v >= threshold)` (int) — raw u8 increment, no overflow entry written.

When `g.indices.len() ≥ 255`: process in chunks of 254 columns, accumulate via `.add(chunk_frozen.view())`.

**`partial_group_min` (int matrix)**: copy first column via `.add(col_view(first))` (start from 0 ⇒ copy), then `.min(col_view(c))` for remaining.

**`partial_group_max` (int matrix)**: `.max(col_view(c))` for all columns (start from 0 ⇒ first column acts as copy).

**`partial_group_any`** uses `or_where` on `TempBitVecBuilder` (two-pass: primary bytes then overflow entries).

**`partial_group_all` / `partial_group_none`** (default): call `partial_group_presence_count`, then iterate slots to produce the bit result. O(n) extra pass, not chunked.

### add_col_from — matrix builder integration

Both matrix builders accept temp-file results directly:

```rust
// PersistentBitMatrixBuilder
fn add_col_from(&mut self, src: &TempBitVec)         -> io::Result<()>
fn add_col_from_int(&mut self, src: &TempCompactIntVec) -> io::Result<()>  // nonzero → 1

// PersistentCompactIntMatrixBuilder
fn add_col_from(&mut self, src: &TempCompactIntVec)  -> io::Result<()>
fn add_col_from_bit(&mut self, src: &TempBitVec)     -> io::Result<()>  // bit → 0/1 u32
```

`add_col_from` copies the temp file to the matrix directory and increments `n_cols`; `close()` writes `meta.json` with the final column count. No separate `write_meta` step needed.

### mask_with

Direct method on `PersistentCompactIntVecBuilder` (and delegation via `TempCompactIntVecBuilder`). Zeros every slot where the corresponding mask bit is 0. Iterates only zero bits — O(n_zeros), O(1) when mask is all-ones.

```
for (w_idx, word) in mask.words():
  if word == u64::MAX: continue   // skip all-ones words
  zeros = !word
  while zeros != 0:
    bit = trailing_zeros(zeros)
    s = w_idx * 64 + bit
    if primary[s] != 0: set(s, 0)   // clears overflow entry too
    zeros &= zeros − 1
```

Terminal operation for Filter (retain only selected kmer slots in a count vector) and Select (positional selection without MPHF).