Add Mermaid diagrams to visualize the trait hierarchy, compact int storage layout, and SIMD-vectorizable arithmetic operations for MemoryIntVec and PersistentCompactIntVec. Also document concrete type structures and planned layer/partition composition rules to improve documentation clarity.
26 KiB
obicompactvec — Complete Reference
Module structure
src/obicompactvec/src/
lib.rs public re-exports
traits.rs BitSlice, BitSliceMut, IntSlice, IntSliceMut + conversion traits
bitvec.rs PersistentBitVec, PersistentBitVecBuilder, BitIter
memoryvec.rs MemoryBitVec
reader.rs PersistentCompactIntVec (read-only)
builder.rs PersistentCompactIntVecBuilder (read-write)
memoryintvec.rs MemoryIntVec
bitmatrix.rs PersistentBitMatrix, PersistentBitMatrixBuilder
intmatrix.rs PersistentCompactIntMatrix, PersistentCompactIntMatrixBuilder
format.rs file format constants, encode/decode helpers
layer_meta.rs LayerMeta (column metadata)
meta.rs matrix metadata
graph TD
traits --> memoryvec
traits --> memoryintvec
bitvec --> memoryvec
bitvec --> bitmatrix
format --> reader
format --> builder
reader --> intmatrix
builder --> intmatrix
builder --> memoryintvec
memoryvec --> traits
memoryintvec --> traits
layer_meta --> bitmatrix
layer_meta --> intmatrix
meta --> bitmatrix
meta --> intmatrix
Compact int encoding
All integer vectors use the same two-tier encoding regardless of storage backend.
Primary array — one u8 per slot:
- Values 0–254 are stored directly. No overhead.
- Value 255 is a sentinel: the slot's actual value is ≥ 255 and lives in the overflow store.
Overflow store — maps slot index to a u32 value ≥ 255:
- In
MemoryIntVecandPersistentCompactIntVecBuilder: aHashMap<usize, u32>in RAM. - In
PersistentCompactIntVec(reader): a sorted[(slot: u64, value: u32)]array in the mmap, with a sparse L1-resident index for binary search.
flowchart LR
slot --> P["primary[slot]: u8"]
P -->|"< 255"| V["value = byte (0–254)"]
P -->|"= 255 sentinel"| OV["overflow store"]
OV -->|"MemoryIntVec / Builder"| HM["HashMap<usize, u32>\nin RAM"]
OV -->|"PersistentCompactIntVec"| SA["sorted [(slot,value)] in mmap\n+ sparse L1 index"]
Key property — sentinel 255 = +∞ on u8:
This is exploited throughout the binary operations. On a u8 comparison, 255 behaves as positive infinity:
min(a, 255) = afor alla ≤ 254→ correct when only one side is overflowmax(a, 255) = 255→ correct sentinel when either side is overflow- Only the both-overflow case requires reading actual values from the overflow store.
In practice, k (overflow count) ≪ n (total slots). Observed genomic data: ~0.07% of kmer slots are in overflow.
Trait hierarchy
classDiagram
class BitSlice {
<<trait>>
+len() usize
+words() &[u64]
+get(slot) bool
+count_ones() u64
+count_zeros() u64
+partial_jaccard_dist(other) (u64,u64)
+jaccard_dist(other) f64
+hamming_dist(other) u64
}
class BitSliceMut {
<<trait>>
+words_mut() &mut [u64]
+set(slot, value)
+copy_from(src)
+and(other)
+or(other)
+xor(other)
+not()
}
class IntSlice {
<<trait>>
+len() usize
+get(slot) u32
+primary_bytes() &[u8]
+overflow_entries() Iterator
+iter() Iterator
+sum() u64
+count_nonzero() u64
+cmp_scalar(pred) MemoryBitVec
+lt/leq/gt/geq(t) MemoryBitVec
}
class IntSliceMut {
<<trait>>
+set(slot, value)
+primary_bytes_mut() &mut [u8]
+clear_overflow()
+inc/dec/add_at(slot)
+copy_from(src)
+min/max/add/diff(other)
+count_bits(bits)
}
class IntToBit {
<<trait blanket>>
+to_bitvec(threshold) MemoryBitVec
+to_presence() MemoryBitVec
}
class BitToInt {
<<trait blanket>>
+to_intvec() MemoryIntVec
}
BitSliceMut --|> BitSlice : extends
IntSliceMut --|> IntSlice : extends
IntToBit --|> IntSlice : blanket T:IntSlice
BitToInt --|> BitSlice : blanket T:BitSlice
BitSlice (read-only)
Required: len(), words() -> &[u64].
Bit i is at words()[i >> 6] bit i & 63 (LSB-first). Padding bits in the last word are always zero — this invariant must be maintained by all implementors.
| Provided method | Implementation | Cost |
|---|---|---|
is_empty() |
len() == 0 |
O(1) |
get(slot) |
word extract | O(1) |
count_ones() |
POPCNT per word | O(n/64) |
count_zeros() |
n − count_ones() |
O(n/64) |
partial_jaccard_dist(other) |
(a&b).popcount, (a|b).popcount per word |
O(n/64) |
jaccard_dist(other) |
from partial | O(n/64) |
hamming_dist(other) |
(a^b).popcount per word |
O(n/64) |
BitSliceMut: BitSlice (mutable)
Required: words_mut() -> &mut [u64].
All bulk operations work at the word level (64 bits/iteration). The compiler auto-vectorizes these loops to AVX2/AVX-512. The zero-padding invariant is maintained: not() re-masks the last word after flipping.
| Provided method | Implementation | Cost |
|---|---|---|
set(slot, value) |
OR / AND-NOT on one word | O(1) |
copy_from(src) |
copy_from_slice = memcpy |
O(n/64) |
and(other) |
w &= o per word |
O(n/64) |
or(other) |
w |= o per word |
O(n/64) |
xor(other) |
w ^= o per word |
O(n/64) |
not() |
w ^= u64::MAX per word, then mask last |
O(n/64) |
No overflow complexity here. The packed u64 representation is already the natural unit for SIMD operations. No sentinel, no HashMap — just bitwise word ops.
IntSlice (read-only)
Required:
len() -> usizeget(slot) -> u32— handles sentinel transparently (binary search into overflow for persistent, HashMap for memory)primary_bytes() -> &[u8]— raw primary array including 255 sentinelsoverflow_entries() -> impl Iterator<Item = (usize, u32)>— (slot, true_value) pairs for all overflow slots
| Provided method | Default implementation | Note |
|---|---|---|
is_empty() |
len() == 0 |
|
iter() |
(0..n).map(|i| self.get(i)) |
Overridden in all concrete types |
sum() |
iter().map(|v| v as u64).sum() |
Overridden in concrete types |
count_nonzero() |
iter().filter(|v| *v > 0).count() |
Overridden in concrete types |
lt(t) |
cmp_scalar(|v| v < t) |
|
leq(t) |
cmp_scalar(|v| v <= t) |
|
gt(t) |
cmp_scalar(|v| v > t) |
|
geq(t) |
cmp_scalar(|v| v >= t) |
|
cmp_scalar(pred) |
two-pass (see below) |
cmp_scalar algorithm — two passes:
Pass 1 — byte scan, O(n):
for s in 0..n:
b = primary[s]
if b < 255 AND pred(b as u32):
set bit s in result word
Pass 2 — overflow fixup, O(k):
for (s, val) in overflow_entries():
if pred(val): set bit s in result word
Pass 1 reads only the primary byte array — no HashMap access. For simple predicates (geq, lt, etc.) the compiler inlines pred and can auto-vectorize the byte comparison loop. Pass 2 handles the O(k) overflow slots that were left as 0 in pass 1.
Previous implementation: pred(self.get(s)) for every slot → O(n log k) due to binary search in overflow. New: O(n) + O(k).
IntSliceMut: IntSlice (mutable)
Required:
set(slot, value: u32)— writes primary byte (or 255 + overflow entry if value ≥ 255); removes stale overflow entry if value drops below 255primary_bytes_mut() -> &mut [u8]— direct mutable access to the primary arrayclear_overflow()— empties the entire overflow store
The required methods expose the encoding internals. All provided methods are implemented in terms of these three + the IntSlice required methods.
| Provided method | Hot path | Overflow case | Cost |
|---|---|---|---|
inc(slot) |
get + set |
— | O(1) or O(log k) |
dec(slot) |
get + set (saturating) |
— | O(1) or O(log k) |
add_at(slot, delta) |
get + set (saturating) |
— | O(1) or O(log k) |
copy_from(src) |
copy_from_slice + clear_overflow + replay overflows |
— | O(n) + O(k) |
min(other) |
byte-level min, O(n) | both-overflow fixup, O(k) | O(n) |
max(other) |
byte-level max, O(n) | pre-pass on other's overflows, O(k) | O(n) |
add(other) |
byte add when both < 255, O(n) | get + + when either = 255 |
O(n) |
diff(other) |
byte saturating_sub when self < 255, O(n) | get + saturating_sub when self = 255 |
O(n) |
count_bits(bits) |
iterate set bits via word scan | — | O(n_ones) |
cmp_scalar |
inherited from IntSlice | — | O(n) + O(k) |
min algorithm:
Exploits 255 = +∞: u8::min(a, 255) = a and u8::min(255, b) = b. Only the case where both sides are ≥ 255 needs actual overflow values.
flowchart TD
A["min(self, other)"] --> B["snapshot self_ov: Vec<(slot,val)>\nsnapshot other_ov: HashMap<slot,val>"]
B --> C["clear_overflow()"]
C --> D["Pass 1 — byte min, SIMD-vectorizable\nprimary[s] = min(self[s], other[s]) ∀s"]
D --> E["Pass 2 — both-overflow fixup\nfor (slot, self_val) in self_ov"]
E --> F{"slot ∈ other_ov?"}
F -->|yes| G["set(slot, min(self_val, other_ov[slot]))"]
F -->|no| H["byte pass wrote other.primary < 255\nclear_overflow removed stale entry\nno action"]
G --> I[done]
H --> I
Overflow entries where only self was overflow are correctly handled: after clear_overflow + byte pass, self.primary[slot] = min(255, other.primary[slot]) = other.primary[slot] (which is < 255). No overflow entry — correct.
max algorithm:
Exploits 255 = +∞: u8::max(a, 255) = 255 → any slot where either side is overflow will have sentinel 255 in the primary after the byte pass. The byte pass cannot distinguish "self had overflow and other did not" from "self was just written to 255 by the byte pass".
Solution: read and update self's original value at other's overflow slots before the byte pass overwrites them.
flowchart TD
A["max(self, other)"] --> B["Pre-pass O(k_other)\nfor (slot, other_val) in other.overflow_entries()"]
B --> C["self_val = self.get(slot)\nself.set(slot, max(self_val, other_val))"]
C --> D["Pass 1 — byte max, SIMD-vectorizable\nprimary[s] = max(self[s], other[s]) ∀s"]
D --> E["Overflow slots: max(255,255)=255\nprimary unchanged\noverflow entry from pre-pass preserved"]
E --> F[done]
After the pre-pass, self.primary[slot] = 255 for all slots in other's overflow. The byte pass leaves those 255s intact. Self's own overflow slots not in other's overflow are also 255 in primary — byte max(255, b < 255) = 255, unchanged. Correct in all cases.
add algorithm:
No sentinel property useful for add: any pair (sb, ob) with sb + ob ≥ 255 creates a new overflow entry, even when neither input was overflow. Cannot simplify via byte arithmetic.
for s in 0..n:
sb = self.primary[s]
ob = other.primary[s]
if sb < 255 AND ob < 255: // hot path: no HashMap
sum = sb as u32 + ob as u32
if sum < 255: self.primary[s] = sum as u8 // direct byte write
else: self.set(s, sum) // creates overflow if needed
else: // at least one is overflow
self.set(s, self.get(s) + other.get(s))
flowchart TD
A["add(self, other)"] --> B{"sb < 255\nAND ob < 255"}
B -->|"yes — hot path\nno HashMap"| C{"sb + ob < 255"}
C -->|yes| D["primary[s] = sum as u8\nsingle byte write"]
C -->|no| E["set(s, sum)\ncreates overflow entry"]
B -->|"no — ≥1 side is overflow"| F["self_val = self.get(s)\nother_val = other.get(s)\nset(s, self_val + other_val)"]
D --> Z[next slot]
E --> Z
F --> Z
The + on u32 values is exact (no saturating_add). Overflow at u32 level panics in debug — not a real risk for kmer counts. The hot path (both < 255, sum < 255) is a single byte write with no HashMap access.
diff (saturating sub) algorithm:
saturating_sub(a, b) = a − min(a, b) = max(0, a − b). Key insight: if self's primary byte < 255, the result is always < 255 (result ≤ a), so no new overflow entries are created and no overflow lookup is needed for self. Only self's overflow slots (primary = 255) need get().
| sb | ob | result | get() needed |
|---|---|---|---|
| < 255 | < 255 | sb.saturating_sub(ob) < 255 |
none |
| < 255 | 255 | 0 (b ≥ 255 > a) | none |
| 255 | < 255 | self.get(s) − ob |
self only |
| 255 | 255 | self.get(s) − other.get(s) |
both |
flowchart TD
A["diff(self, other)"] --> B{"sb < 255\nself not overflow"}
B -->|"yes — hot path O(n)"| C{"ob < 255"}
C -->|yes| D["primary[s] = sb.saturating_sub(ob)\nbyte write, no HashMap"]
C -->|"no: b ≥ 255 > a"| E["primary[s] = 0"]
B -->|"no — cold path O(k_self)"| F["self_val = self.get(s)"]
F --> G{"ob < 255"}
G -->|yes| H["other_val = ob as u32"]
G -->|no| I["other_val = other.get(s)"]
H --> J["set(s, self_val.saturating_sub(other_val))"]
I --> J
D --> Z[next slot]
E --> Z
J --> Z
Overflow entries that drop below 255 (case sb=255, result < 255) are removed by set(). Overflow entries that remain ≥ 255 are updated. Correct in all four cases.
count_bits algorithm:
Increments self at each slot where the corresponding bit in bits is set. Iterates bits.words() and skips zero words entirely — O(n_ones) rather than O(n).
for (w_idx, word) in bits.words():
if word == 0: continue
base = w_idx * 64
while word != 0:
bit = trailing_zeros(word)
self.inc(base + bit)
word &= word − 1 // clear lowest set bit
Concrete types
classDiagram
class MemoryBitVec {
-words: Vec~u64~
-n: usize
+iter() BitIter
+ones(n) Self
+persist(path) Builder
}
class MemoryIntVec {
-primary: Vec~u8~
-overflow: HashMap~usize,u32~
-n: usize
+iter() MemoryIntIter
+filled(n, value) Self
+persist(path) Builder
}
class PersistentBitVec {
-mmap: Mmap
-n: usize
+iter() BitIter
+count_ones() u64
}
class PersistentBitVecBuilder {
-mmap: MmapMut
-n: usize
+close()
+build_from(src, path)
+build_from_counts(src, t, path)
}
class PersistentCompactIntVec {
-mmap: Mmap
-n usize
-n_overflow usize
-step usize
-index: Vec~(usize,usize)~
+iter() Iter
+get(slot) u32
+sum() u64
}
class PersistentCompactIntVecBuilder {
-mmap: MmapMut
-n: usize
-overflow: HashMap~usize,u32~
+set(slot, value)
+close()
+build_from(src, path)
}
MemoryBitVec ..|> BitSlice
MemoryBitVec ..|> BitSliceMut
PersistentBitVec ..|> BitSlice
PersistentBitVecBuilder ..|> BitSlice
PersistentBitVecBuilder ..|> BitSliceMut
MemoryIntVec ..|> IntSlice
MemoryIntVec ..|> IntSliceMut
PersistentCompactIntVec ..|> IntSlice
PersistentCompactIntVecBuilder ..|> IntSlice
PersistentCompactIntVecBuilder ..|> IntSliceMut
PersistentBitVecBuilder --> PersistentBitVec : close() then open()
PersistentCompactIntVecBuilder --> PersistentCompactIntVec : close() then open()
Memory types
MemoryBitVec
struct MemoryBitVec { words: Vec<u64>, n: usize }
Implements BitSlice + BitSliceMut. Owns its word array. Used as the result type of cmp_scalar / filter operations and as an intermediate for bit-level computations.
Std ops: BitAnd, BitOr, BitXor, Not (owned and borrowed), BitAndAssign, BitOrAssign, BitXorAssign — all delegate to BitSliceMut methods.
iter() returns a BitIter<'_> (word-level, see below).
MemoryIntVec
struct MemoryIntVec {
primary: Vec<u8>,
overflow: HashMap<usize, u32>,
n: usize,
}
Implements IntSlice + IntSliceMut. Overrides: iter() → inherent iter() (merge-scan), sum(), count_nonzero().
IntSlice required impls: primary_bytes() → &self.primary; overflow_entries() → self.overflow.iter().map(...).
IntSliceMut required impls: set() writes to self.primary[slot] and inserts/removes from self.overflow; primary_bytes_mut() → &mut self.primary; clear_overflow() → self.overflow.clear().
Std ops: Add<&B>, Sub<&B> (owned and borrowed), AddAssign<&B>, SubAssign<&B> — delegate to IntSliceMut::add / diff.
From<&S: IntSlice>: copies primary bytes + overflow entries. O(n) + O(k).
Persistent types
PersistentBitVec / PersistentBitVecBuilder
See persistent_bit_vec.md. PersistentBitVec is read-only (implements BitSlice). PersistentBitVecBuilder is read-write (implements BitSlice + BitSliceMut).
BitIter<'a> — shared iterator type for both MemoryBitVec and PersistentBitVec:
pub struct BitIter<'a> { pub(crate) words: &'a [u64], pub(crate) slot: usize, pub(crate) n: usize }
Word-level scan: (words[slot >> 6] >> (slot & 63)) & 1 != 0. One word serves 64 iterations. pub type MemoryBitIter<'a> = BitIter<'a> preserves the public API name.
PersistentCompactIntVec / PersistentCompactIntVecBuilder
See persistent_compact_int_vec.md for file format and lifecycle.
PersistentCompactIntVec implements IntSlice. Overrides: iter() → inherent merge-scan Iter; sum(); count_nonzero(). overflow_entries() returns a sequential scan (0..n_overflow).map(|i| (data_slot(i), data_value(i))) — no binary search since entries are stored sorted.
PersistentCompactIntVecBuilder implements IntSlice + IntSliceMut. iter() is NOT overridden (default get-per-slot) because the overflow HashMap is unsorted. sum() and count_nonzero() are overridden using byte_sum / byte_count_nonzero on the mmap primary slice — avoids per-slot overhead.
Override rationale: the default iter(), sum(), count_nonzero() on IntSlice call self.get(s) per slot, which is O(log k) binary search for PersistentCompactIntVec. Overrides provide O(n + k) merge-scan or O(n) byte scan instead.
IntSlice implementors — override summary
| Type | iter() |
sum() |
count_nonzero() |
|---|---|---|---|
MemoryIntVec |
inherent merge-scan ✓ | byte_sum ✓ |
byte_count_nonzero ✓ |
PersistentCompactIntVecBuilder |
default (get-per-slot) | byte_sum on mmap ✓ |
byte_count_nonzero on mmap ✓ |
PersistentCompactIntVec |
inherent merge-scan Iter ✓ | inherent sum() ✓ |
inherent count_nonzero() ✓ |
PackedIntCol<'a> |
inherent PackedIntColIter ✓ | byte_sum ✓ | byte_count_nonzero ✓ |
PackedIntCol is used internally by PersistentCompactIntMatrix (packed format) for column views.
Matrix types
Four matrix types, two encodings × two formats:
| Columnar format | Packed format | |
|---|---|---|
| Bit | PersistentBitMatrix |
— |
| Int | PersistentCompactIntMatrix (columnar) |
PersistentCompactIntMatrix (packed) |
PersistentCompactIntMatrix is an enum behind a transparent API — the caller does not see whether the on-disk format is columnar (one .pciv per column) or packed (one .pcmx file interleaving all columns). col(c) and col_slice(c) return column views that implement IntSlice.
pack_compact_int_matrix and pack_bit_matrix convert a columnar matrix to packed format.
For details see persistent_compact_int_vec.md and persistent_bit_vec.md.
Conversion traits
Four blanket-impl traits on top of BitSlice / IntSlice:
IntToBit: IntSlice
to_bitvec(threshold: u32) -> MemoryBitVec— bit set iff value ≥ threshold (delegates togeq)to_presence() -> MemoryBitVec— bit set iff value ≥ 1 (delegates togeq(1))
BitToInt: BitSlice
to_intvec() -> MemoryIntVec— expands each bit to au8(0 or 1) in a new primary array- Uses a
static EXPAND_BYTE: [[u8; 8]; 256]lookup table — 8 bits expanded per byte, word-level outer loop
Both IntToBit and BitToInt are implemented for all T: IntSlice / T: BitSlice via blanket impls.
Aggregation traits (matrix level)
ColumnWeights
trait ColumnWeights: Send + Sync {
fn col_weights(&self) -> Array1<u64>; // sum per column
fn partial_kmer_counts(&self) -> Array1<u64>; // default = col_weights()
}
partial_kmer_counts is overridden for count matrices to return count_nonzero per column (distinct kmers) rather than total count.
CountPartials
Abstract required methods: partial_bray, partial_euclidean, partial_threshold_jaccard, partial_relfreq_bray, partial_relfreq_euclidean, partial_hellinger.
Additivity rule: self-contained partials (partial_bray, partial_euclidean, partial_threshold_jaccard) can be element-wise summed across all (partition, layer) pairs before applying the finalisation. Normalised partials (partial_relfreq_*, partial_hellinger) require the global col_weights (accumulated across all layers and all partitions) as parameter — not per-layer or per-partition weights.
partial_threshold_jaccard returns (inter, union), not a single matrix, because union[i,j] depends on both columns simultaneously and cannot be reconstructed from per-column statistics.
Provided finalisations (default implementations):
| Finalisation | Formula |
|---|---|
bray_dist_matrix() |
1 − 2·partial_bray[i,j] / (w[i] + w[j]) |
euclidean_dist_matrix() |
√partial_euclidean[i,j] |
threshold_jaccard_dist_matrix(t) |
1 − inter[i,j] / union[i,j] |
relfreq_bray_dist_matrix() |
1 − partial_relfreq_bray[i,j] (two-pass: col_weights then partial) |
relfreq_euclidean_dist_matrix() |
√partial_relfreq_euclidean[i,j] |
hellinger_dist_matrix() |
√partial_hellinger[i,j] / √2 |
hellinger_euclidean_dist_matrix() |
√partial_hellinger[i,j] |
BitPartials
Required: partial_jaccard() -> (Array2<u64>, Array2<u64>) (inter, union), partial_hamming() -> Array2<u64>. Both additive across layers and partitions.
Planned — Filter / Select API
Composition across layers and partitions
graph TD
subgraph Index
CG["ColGroup\nVec<usize> — valid everywhere"]
ACC["MemoryIntVec\nglobal accumulator"]
PRED["geq / leq / and / or\n→ MemoryBitVec mask"]
end
subgraph "Layer 1"
subgraph "Partition A kmers 0..k/2"
MA["Matrix A\npartial_group_presence_count"]
end
subgraph "Partition B kmers k/2..k"
MB["Matrix B\npartial_group_presence_count"]
end
CONCAT1["concat → MemoryIntVec\[0..k\]"]
end
subgraph "Layer 2"
CONCAT2["concat → MemoryIntVec\[0..k\]"]
end
CG -->|"same indices"| MA
CG -->|"same indices"| MB
MA -->|"kmer range A"| CONCAT1
MB -->|"kmer range B"| CONCAT1
CONCAT1 -->|"IntSliceMut::add"| ACC
CONCAT2 -->|"IntSliceMut::add"| ACC
ACC --> PRED
ColGroup
struct ColGroup { name: String, indices: Vec<usize> }
Defined once at the index level from column metadata. Valid in all matrices of all layers and partitions because column structure is identical across the entire hierarchy (same samples/genomes everywhere; only rows = kmer slots are partitioned).
ColGroup is passed by reference unchanged to any matrix — no index translation.
Composition axis
- Across partitions: kmer space is partitioned → partial results are concatenated (disjoint kmer ranges).
- Across layers: same kmer space, different counts → partial results are aggregated (add, OR, etc.).
Additivity rules
flowchart LR
subgraph "Matrix level — returns MemoryIntVec"
PGP["partial_group_presence_count\npartial_group_sum\npartial_group_any → MemoryBitVec"]
end
subgraph "Index level — applies predicate"
GA["group_at_least(k)\n= accumulate.geq(k)"]
GALL["group_all\n= accumulate.geq(n_cols)"]
GANY["group_any\n= OR of partial_group_any"]
end
PGP -->|"concat across partitions\nadd across layers"| GA
PGP --> GALL
PGP --> GANY
Non-additive predicates (group_all, group_at_least) do not exist at matrix level — they require the global accumulated count.
MatrixGroupOps (planned trait)
Group operations live on the matrix and expose only additive intermediates (MemoryIntVec). Predicates (final thresholds → MemoryBitVec) are applied at the index level after accumulation.
trait MatrixGroupOps {
// How many columns in group have value >= threshold, per kmer slot
fn partial_group_presence_count(&self, g: &ColGroup, threshold: u32) -> MemoryIntVec;
// Sum of values across group columns, per kmer slot
fn partial_group_sum(&self, g: &ColGroup) -> MemoryIntVec;
// Kmer present (value >= threshold) in at least one column of group
fn partial_group_any(&self, g: &ColGroup, threshold: u32) -> MemoryBitVec;
}
Non-additive predicates (group_all, group_at_least(k)) are not on the matrix — they are composed at the index level from the additive intermediates:
// "present in >= 2 ingroup columns with count >= 3, absent from all outgroup"
let presence = layers.map(|l| l.partial_group_presence_count(&ingroup, 3)).sum();
let in_mask = presence.geq(2); // MemoryBitVec
let out_sum = layers.map(|l| l.partial_group_sum(&outgroup)).sum();
let out_mask = out_sum.leq(0); // MemoryBitVec
let mask = in_mask.and(&out_mask); // BitSliceMut::and — O(n/64)
mask_with (planned IntSliceMut method)
Apply a bit mask to a count vector: zero slots where the mask bit is 0. Iterates only zero bits — O(n_zeros), O(1) when mask is all-ones.
for (w_idx, word) in mask.words():
if word == u64::MAX: continue
zeros = !word
while zeros != 0:
bit = trailing_zeros(zeros)
s = w_idx * 64 + bit
self.set(s, 0)
zeros &= zeros − 1
This is the terminal operation for both Filter (zero non-selected kmer slots in a count matrix) and Select (positional selection without MPHF).