Replace trait-based API documentation with concrete, zero-copy view structs and update all associated diagrams. Refine algorithmic descriptions for sentinel handling, overflow stores, and bulk operations. Clarify temporary file lifecycles and group-chunking strategies to support memory-efficient parallel aggregation.
19 KiB
obicompactvec — Complete Reference
Module structure
src/obicompactvec/src/
lib.rs public re-exports
views.rs BitSliceView<'a>, IntSliceView<'a> — zero-copy read views
traits.rs ColumnWeights, CountPartials, BitPartials (matrix aggregation)
bitvec.rs PersistentBitVec, PersistentBitVecBuilder, BitIter
reader.rs PersistentCompactIntVec (read-only)
builder.rs PersistentCompactIntVecBuilder (read-write)
tempintvec.rs TempCompactIntVec, TempCompactIntVecBuilder (temp-file-backed)
tempbitvec.rs TempBitVec, TempBitVecBuilder (temp-file-backed)
bitmatrix.rs PersistentBitMatrix, PersistentBitMatrixBuilder
intmatrix.rs PersistentCompactIntMatrix, PersistentCompactIntMatrixBuilder
colgroup.rs ColGroup, MatrixGroupOps trait
format.rs file format constants, encode/decode helpers
layer_meta.rs LayerMeta (column metadata)
meta.rs matrix metadata
graph TD
views --> bitvec
views --> builder
views --> tempbitvec
views --> tempintvec
views --> bitmatrix
views --> intmatrix
format --> reader
format --> builder
reader --> intmatrix
reader --> tempintvec
builder --> intmatrix
builder --> tempintvec
bitvec --> tempbitvec
bitvec --> bitmatrix
tempintvec --> intmatrix
tempintvec --> bitmatrix
tempbitvec --> intmatrix
tempbitvec --> bitmatrix
colgroup --> intmatrix
colgroup --> bitmatrix
layer_meta --> bitmatrix
layer_meta --> intmatrix
meta --> bitmatrix
meta --> intmatrix
Compact int encoding
All integer vectors use the same two-tier encoding regardless of storage backend.
Primary array — one u8 per slot:
- Values 0–254 are stored directly. No overhead.
- Value 255 is a sentinel: the slot's actual value is ≥ 255 and lives in the overflow store.
Overflow store — maps slot index to a u32 value ≥ 255:
- In
PersistentCompactIntVecBuilder: aHashMap<usize, u32>in RAM. - In
PersistentCompactIntVec(reader): a sorted[(slot: u64, value: u32)]array in the mmap, with a sparse L1-resident index for binary search.
flowchart LR
slot --> P["primary[slot]: u8"]
P -->|"< 255"| V["value = byte (0–254)"]
P -->|"= 255 sentinel"| OV["overflow store"]
OV -->|"Builder"| HM["HashMap<usize, u32>\nin RAM"]
OV -->|"PersistentCompactIntVec"| SA["sorted [(slot,value)] in mmap\n+ sparse L1 index"]
Key property — sentinel 255 = +∞ on u8:
min(a, 255) = afor alla ≤ 254→ correct when only one side is overflowmax(a, 255) = 255→ correct sentinel when either side is overflow- Only the both-overflow case requires reading actual values from the overflow store.
In practice, k (overflow count) ≪ n (total slots). Observed genomic data: ~0.07% of kmer slots are in overflow.
View types
The previous trait hierarchy (BitSlice, BitSliceMut, IntSlice, IntSliceMut) has been replaced by two concrete zero-copy view structs with inherent methods. Views are Copy — passing them is free. All read operations live on these two types.
BitSliceView<'a>
#[derive(Clone, Copy)]
pub struct BitSliceView<'a> { pub(crate) words: &'a [u64], pub(crate) n: usize }
Bit i is at words[i >> 6] bit i & 63 (LSB-first). Padding bits in the last word are zero.
| Method | Cost |
|---|---|
len(), is_empty() |
O(1) |
get(slot) |
O(1) |
count_ones() |
POPCNT per word, O(n/64) |
count_zeros() |
n − count_ones(), O(n/64) |
iter() -> BitSliceIter<'a> |
O(1) setup, O(n) iteration |
partial_jaccard_dist(other: BitSliceView) |
(a&b).popcount, (a|b).popcount per word, O(n/64) |
jaccard_dist(other: BitSliceView) |
from partial, O(n/64) |
hamming_dist(other: BitSliceView) |
(a^b).popcount per word, O(n/64) |
BitSliceIter<'a>: word-level scan; one word per 64 iterations.
IntSliceView<'a>
#[derive(Clone, Copy)]
pub struct IntSliceView<'a> {
pub(crate) primary: &'a [u8],
pub(crate) overflow_raw: &'a [u8], // sorted [(slot:u64, value:u32)] entries
pub(crate) n_overflow: usize,
pub(crate) n: usize,
}
overflow_raw contains n_overflow entries of OVERFLOW_ENTRY_SIZE bytes each, sorted by slot. The sort invariant is established at close()/freeze() time.
| Method | Cost |
|---|---|
len(), is_empty() |
O(1) |
primary_bytes() |
O(1) |
overflow_entries() -> impl Iterator<(usize,u32)> |
O(n_overflow) iteration |
get(slot) |
O(1) primary; binary search O(log k) for overflow slots |
iter() -> IntSliceViewIter<'a> |
merge scan, O(n + k) |
sum() |
byte scan + overflow, O(n + k) |
count_nonzero() |
byte scan, O(n) |
Distance methods (bray_dist, euclidean_dist, jaccard_dist, …) |
O(n + k) |
IntSliceViewIter<'a>: merge scan using overflow_pos index. Requires sorted overflow — guaranteed by the construction lifecycle.
Builder view() vs reader view(): PersistentCompactIntVecBuilder stores overflow as an unsorted HashMap, not raw bytes. Its view() returns an IntSliceView with overflow_raw = &[] and n_overflow = 0. This is intentional — the view is primarily useful after freeze(). During building, callers that need overflow use overflow_entries() directly.
Concrete types
classDiagram
class BitSliceView {
+words: &[u64]
+n: usize
+get(slot) bool
+count_ones() u64
+iter() BitSliceIter
+jaccard_dist/hamming_dist(other: BitSliceView)
}
class IntSliceView {
+primary: &[u8]
+overflow_raw: &[u8]
+n_overflow: usize
+n: usize
+get(slot) u32
+iter() IntSliceViewIter
+overflow_entries() Iterator
+bray_dist/euclidean_dist/…(other: IntSliceView)
}
class PersistentBitVec {
-mmap: Mmap
-n: usize
+view() BitSliceView
+get(slot) bool
+count_ones/zeros() u64
+iter() BitIter
+partial_jaccard_dist(&Self) (u64,u64)
+jaccard_dist/hamming_dist(&Self) …
}
class PersistentBitVecBuilder {
-mmap: MmapMut
-n: usize
+view() BitSliceView
+set(slot, bool)
+or/and/xor/not(BitSliceView)
+copy_from(BitSliceView)
+close() / finish() → PersistentBitVec
}
class PersistentCompactIntVec {
-mmap: Mmap
-n: usize
-n_overflow: usize
-step: usize
-index: Vec~(usize,usize)~
+view() IntSliceView
+get(slot) u32
+iter() Iter
+sum/count_nonzero() u64
+bray_dist/euclidean_dist/… (&Self)
}
class PersistentCompactIntVecBuilder {
-mmap: MmapMut
-n: usize
-overflow: HashMap~usize,u32~
+view() IntSliceView
+set(slot, u32) / get(slot) u32
+inc / inc_present / inc_present_fast
+inc_predicate / inc_predicate_fast
+add/min/max/diff/mask_with(…View)
+primary_bytes/primary_bytes_mut()
+close() / finish() → PersistentCompactIntVec
}
PersistentBitVec --> BitSliceView : view()
PersistentBitVecBuilder --> BitSliceView : view()
PersistentCompactIntVec --> IntSliceView : view()
PersistentCompactIntVecBuilder --> IntSliceView : view() (primary only)
PersistentBitVecBuilder --> PersistentBitVec : close() then open()
PersistentCompactIntVecBuilder --> PersistentCompactIntVec : close() then open()
PersistentBitVec / PersistentBitVecBuilder
PersistentBitVec is the read-only type. view() returns a BitSliceView<'_> over the mmap word array. Direct inherent methods delegate to the view: count_ones(), count_zeros(), partial_jaccard_dist(&Self), jaccard_dist(&Self), hamming_dist(&Self).
BitIter<'a> — exported iterator for PersistentBitVec::iter():
pub struct BitIter<'a> { pub(crate) words: &'a [u64], pub(crate) slot: usize, pub(crate) n: usize }
PersistentBitVecBuilder is the read-write type. Mutation operations accept BitSliceView<'_>:
| Method | Cost |
|---|---|
set(slot, bool) |
O(1) |
view() -> BitSliceView<'_> |
O(1) |
or/and/xor(BitSliceView) |
word-level, O(n/64), SIMD-friendly |
not() |
w ^= u64::MAX per word, re-masks last word |
copy_from(BitSliceView) |
copy_from_slice |
PersistentCompactIntVec / PersistentCompactIntVecBuilder
PersistentCompactIntVec is the read-only type. view() returns an IntSliceView<'_> over the mmap primary and overflow arrays. Inherent iter() is a merge scan (Iter struct). Inherent sum() and count_nonzero() use fast byte-scan helpers.
PersistentCompactIntVecBuilder is the read-write type. Mutation methods on the builder fall into two categories:
Point mutations:
| Method | Note |
|---|---|
set(slot, u32) |
writes primary[slot] or 255+overflow |
get(slot) -> u32 |
reads primary byte or HashMap |
inc(slot) |
get + set, O(1) |
Bulk computation methods — accept view arguments:
| Method | Semantics | Overflow |
|---|---|---|
inc_present(BitSliceView) |
+= 1 at each 1-bit |
via inc, safe for any group size |
inc_present_fast(BitSliceView) |
same, raw u8 += 1 |
debug_assert no 255 reached |
inc_predicate(IntSliceView, pred) |
+= 1 where pred(col[s]) |
two-pass, safe |
inc_predicate_fast(IntSliceView, pred) |
same, raw u8 | debug_assert no 255 reached |
add(IntSliceView) |
self[s] += other[s] |
primary fast path + overflow fallback |
min(IntSliceView) |
byte min + both-overflow fixup | see algorithm below |
max(IntSliceView) |
pre-pass + byte max | see algorithm below |
diff(IntSliceView) |
saturating sub | self<255 hot path |
mask_with(BitSliceView) |
zeros slots where mask bit = 0 | O(n_zeros) |
inc_present_fast / inc_predicate_fast invariant: caller guarantees no counter reaches 255 during the operation (group size < 255 for inc_present_fast, or chunk size < 255 for inc_predicate_fast). Violation is caught by debug_assert in dev builds.
min algorithm:
Exploits 255 = +∞: byte-level min is correct unless both sides are overflow.
snapshot self_ov: Vec<(slot,val)>
snapshot other_ov: HashMap<slot,val>
clear_overflow()
Pass 1 — byte min, SIMD-vectorizable, O(n)
Pass 2 — both-overflow fixup, O(k_self):
for (slot, self_val) in self_ov:
if slot ∈ other_ov: set(slot, min(self_val, other_ov[slot]))
max algorithm:
Cannot do byte max first — max(255, b<255)=255 overwrites self's original overflow value. Pre-pass reads self's value at other's overflow slots before the byte pass.
Pre-pass O(k_other): for (slot, other_val) in other.overflow_entries():
set(slot, max(self.get(slot), other_val))
Pass 1 — byte max, SIMD-vectorizable, O(n)
Matrix types
Four matrix types, two encodings × two formats:
| Columnar format | Packed format | |
|---|---|---|
| Bit | PersistentBitMatrix (Columnar variant) |
PersistentBitMatrix (Packed variant) |
| Int | PersistentCompactIntMatrix (Columnar variant) |
PersistentCompactIntMatrix (Packed variant) |
Both matrix types are enums (Columnar / Packed / Implicit for bit) behind a transparent API. col_view(c) returns the appropriate view directly:
// PersistentBitMatrix
pub fn col_view(&self, c: usize) -> BitSliceView<'_>
// PersistentCompactIntMatrix
pub fn col_view(&self, c: usize) -> IntSliceView<'_>
No wrapper enums (BitColView, IntColView): the caller receives a Copy view struct immediately usable with any view method or bulk builder method.
pack_compact_int_matrix and pack_bit_matrix convert columnar → packed format.
Aggregation traits (matrix level)
ColumnWeights
trait ColumnWeights: Send + Sync {
fn col_weights(&self) -> Array1<u64>; // sum per column
fn partial_kmer_counts(&self) -> Array1<u64>; // default = col_weights()
}
partial_kmer_counts is overridden for count matrices to return count_nonzero per column (distinct kmers) rather than total count.
CountPartials
Abstract required methods: partial_bray, partial_euclidean, partial_threshold_jaccard, partial_relfreq_bray, partial_relfreq_euclidean, partial_hellinger.
Additivity rule: self-contained partials (partial_bray, partial_euclidean, partial_threshold_jaccard) can be element-wise summed across all (partition, layer) pairs. Normalised partials (partial_relfreq_*, partial_hellinger) require the global col_weights (accumulated across all layers and all partitions) as parameter.
partial_threshold_jaccard returns (inter, union) because union[i,j] depends on both columns simultaneously.
Provided finalisations:
| Finalisation | Formula |
|---|---|
bray_dist_matrix() |
1 − 2·partial_bray[i,j] / (w[i] + w[j]) |
euclidean_dist_matrix() |
√partial_euclidean[i,j] |
threshold_jaccard_dist_matrix(t) |
1 − inter[i,j] / union[i,j] |
relfreq_bray_dist_matrix() |
1 − partial_relfreq_bray[i,j] |
relfreq_euclidean_dist_matrix() |
√partial_relfreq_euclidean[i,j] |
hellinger_dist_matrix() |
√partial_hellinger[i,j] / √2 |
hellinger_euclidean_dist_matrix() |
√partial_hellinger[i,j] |
BitPartials
Required: partial_jaccard() -> (Array2<u64>, Array2<u64>), partial_hamming() -> Array2<u64>. Both additive across layers and partitions.
Temp-file-backed types
All inter-function results use temp-file-backed types so the OS can page them out under memory pressure. This matters in practice: processing dozens of layers × hundreds of partitions in parallel would otherwise accumulate gigabytes of live anonymous memory.
Lifecycle
TempCompactIntVecBuilder::new(n) → writable mmap in TempDir
↓ (inc_present_fast / inc_predicate_fast / add / mask_with / …)
.freeze() → TempCompactIntVec (read-only mmap + TempDir)
↓ (optional)
.make_persistent(path) → PersistentCompactIntVec (permanent file)
Same pattern for TempBitVecBuilder → TempBitVec → PersistentBitVec.
Drop order: TempCompactIntVec { vec: PersistentCompactIntVec, _temp: TempDir } — Rust drops fields in declaration order. vec (mmap) released before _temp (directory deleted). No explicit drop() needed.
TempCompactIntVec / TempCompactIntVecBuilder
pub struct TempCompactIntVec {
vec: PersistentCompactIntVec,
_temp: TempDir, // dropped after vec
}
pub(crate) struct TempCompactIntVecBuilder {
builder: PersistentCompactIntVecBuilder,
temp: TempDir,
}
TempCompactIntVec: read access via get(slot), sum(), iter(), view() -> IntSliceView<'_>.
TempCompactIntVecBuilder: full delegation to inner PersistentCompactIntVecBuilder — all bulk computation methods (inc_present_fast, inc_predicate_fast, add, min, max, diff, mask_with) are exposed as pub(crate).
TempBitVec / TempBitVecBuilder
pub struct TempBitVec {
vec: PersistentBitVec,
_temp: TempDir,
}
pub(crate) struct TempBitVecBuilder {
builder: PersistentBitVecBuilder,
temp: TempDir,
}
TempBitVec: read access via get(slot), count_ones(), view() -> BitSliceView<'_>, iter().
TempBitVecBuilder: exposes set(slot, bool), or(BitSliceView), and:
pub(crate) fn or_where(&mut self, col: IntSliceView<'_>, pred: impl Fn(u32) -> bool)
or_where — two passes, no intermediate allocation:
Pass 1 — primary bytes, O(n):
for slot in 0..n:
b = col.primary_bytes()[slot]
if b < 255 AND pred(b as u32): self.set(slot, true)
Pass 2 — overflow, O(k):
for (slot, val) in col.overflow_entries():
if pred(val): self.set(slot, true)
Filter / Select API
ColGroup
pub struct ColGroup { pub name: String, pub indices: Vec<usize> }
Defined once at the index level from column metadata. Valid in all matrices of all layers and partitions — column structure is identical across the entire hierarchy; only rows (kmer slots) are partitioned.
Composition axis
- Across partitions: kmer space is partitioned → partial results concatenated (disjoint kmer ranges).
- Across layers: same kmer space, different counts → partial results aggregated (add, OR, etc.).
MatrixGroupOps
Group operations expose only additive intermediates backed by temp files. Final predicates are applied at the index level after accumulation.
pub trait MatrixGroupOps {
fn partial_group_presence_count(&self, g: &ColGroup, threshold: u32)
-> io::Result<TempCompactIntVec>;
fn partial_group_sum(&self, g: &ColGroup)
-> io::Result<TempCompactIntVec>;
fn partial_group_any(&self, g: &ColGroup, threshold: u32)
-> io::Result<TempBitVec>;
}
Implemented for both PersistentCompactIntMatrix and PersistentBitMatrix. For bit matrices, partial_group_sum delegates to partial_group_presence_count(g, 1).
partial_group_presence_count — chunking for large groups:
When g.indices.len() < 255: per-slot counts stay within u8 range. Use inc_present_fast (bit matrix) or inc_predicate_fast(col_view(c), |v| v >= threshold) (int matrix) — raw u8 increment, no overflow map written.
When g.indices.len() ≥ 255: process in chunks of 254 columns (each chunk stays within u8 range), accumulate into a running builder via .add(chunk_frozen.view()).
fast path (< 255 cols):
builder = TempCompactIntVecBuilder::new(n)
for c in group:
builder.inc_predicate_fast(matrix.col_view(c), |v| v >= threshold)
builder.freeze()
slow path (≥ 255 cols):
result = TempCompactIntVecBuilder::new(n)
for chunk in group.chunks(254):
chunk_b = TempCompactIntVecBuilder::new(n)
for c in chunk:
chunk_b.inc_predicate_fast(matrix.col_view(c), |v| v >= threshold)
frozen = chunk_b.freeze()
result.add(frozen.view())
result.freeze()
partial_group_any uses or_where on TempBitVecBuilder:
result = TempBitVecBuilder::new(n)
for c in group:
result.or_where(matrix.col_view(c), |v| v >= threshold)
result.freeze()
Non-additive predicates (group_all, group_at_least(k)) are composed at the index level:
// "present in >= 2 ingroup columns with count >= 3, absent from all outgroup"
let presence = layers.map(|l| l.partial_group_presence_count(&ingroup, 3)?).add_all()?;
let in_mask = presence.view().geq(2); // IntSliceView method
let out_sum = layers.map(|l| l.partial_group_sum(&outgroup)?).add_all()?;
let out_mask = out_sum.view().leq(0);
let mut mask_b = TempBitVecBuilder::new(n)?;
mask_b.copy_from(in_mask);
mask_b.and(out_mask);
mask_with
Direct method on PersistentCompactIntVecBuilder (and delegation via TempCompactIntVecBuilder). Zeros every slot where the corresponding mask bit is 0. Iterates only zero bits — O(n_zeros), O(1) when mask is all-ones.
for (w_idx, word) in mask.words():
if word == u64::MAX: continue // skip all-ones words
zeros = !word
while zeros != 0:
bit = trailing_zeros(zeros)
s = w_idx * 64 + bit
if primary[s] != 0: set(s, 0) // clears overflow entry too
zeros &= zeros − 1
Terminal operation for Filter (retain only selected kmer slots in a count vector) and Select (positional selection without MPHF).