feat(bitvec): add partial Jaccard, fix padding, optimize constructor
Introduces `partial_jaccard_dist` to return raw intersection and union counts, improving Jaccard distance flexibility. Corrects `not()` to explicitly zero padding bits in the final word, ensuring accurate bit-counting for partially-filled words. Adds an optimized `build_from_counts` constructor.
This commit is contained in:
@@ -246,6 +246,12 @@ Each partition's new layer is built independently; the operation is fully parall
|
||||
|
||||
---
|
||||
|
||||
## Relationship to target architecture
|
||||
|
||||
The target architecture (see [Kmer index architecture](../architecture/index_architecture.md)) separates `MphfLayer` from data stores entirely and introduces a `PartitionedIndex` with parallel dispatch and an `Aggregator` pattern. The current implementation is a stepping stone: `obicompactvec` types are already fully decoupled from the MPHF; the remaining refactoring is within `obilayeredmap` itself.
|
||||
|
||||
---
|
||||
|
||||
## Open questions
|
||||
|
||||
- **Mode 4**: count matrix (n_kmers × n_genomes × bytes_per_count) is structurally identical to mode 3 but uses `PersistentCompactIntMatrix` with G columns. Build API not yet implemented. Scale concern: hundreds of GB for large collections — a sparse representation may be required at high genome counts.
|
||||
|
||||
Reference in New Issue
Block a user