docs: clarify MPHF indexing, storage layout, and distance traits
Formalize the two-phase MPHF indexing architecture and update Phase 6 to use `evidence.bin` for direct kmer extraction. Simplify the evidence and unitig storage layouts to flat packed formats enabling O(1) random access. Introduce aggregation traits (`ColumnWeights`, `CountPartials`, `BitPartials`) to support additive distance metric decomposition across partitions. Narrow the documented scope from metagenomic to individual genome datasets, and replace speculative open questions with concrete implementation specifications.
This commit is contained in:
@@ -952,6 +952,45 @@
|
||||
</ul>
|
||||
</nav>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#aggregation-traits-obicompactvectraits" class="md-nav__link">
|
||||
<span class="md-ellipsis">
|
||||
|
||||
Aggregation traits — obicompactvec::traits
|
||||
|
||||
</span>
|
||||
</a>
|
||||
|
||||
<nav class="md-nav" aria-label="Aggregation traits — obicompactvec::traits">
|
||||
<ul class="md-nav__list">
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#columnweights" class="md-nav__link">
|
||||
<span class="md-ellipsis">
|
||||
|
||||
ColumnWeights
|
||||
|
||||
</span>
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#bitpartials" class="md-nav__link">
|
||||
<span class="md-ellipsis">
|
||||
|
||||
BitPartials
|
||||
|
||||
</span>
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
</nav>
|
||||
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
@@ -1293,6 +1332,45 @@
|
||||
</ul>
|
||||
</nav>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#aggregation-traits-obicompactvectraits" class="md-nav__link">
|
||||
<span class="md-ellipsis">
|
||||
|
||||
Aggregation traits — obicompactvec::traits
|
||||
|
||||
</span>
|
||||
</a>
|
||||
|
||||
<nav class="md-nav" aria-label="Aggregation traits — obicompactvec::traits">
|
||||
<ul class="md-nav__list">
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#columnweights" class="md-nav__link">
|
||||
<span class="md-ellipsis">
|
||||
|
||||
ColumnWeights
|
||||
|
||||
</span>
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
<li class="md-nav__item">
|
||||
<a href="#bitpartials" class="md-nav__link">
|
||||
<span class="md-ellipsis">
|
||||
|
||||
BitPartials
|
||||
|
||||
</span>
|
||||
</a>
|
||||
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
</nav>
|
||||
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
@@ -1520,6 +1598,27 @@ offset 16:
|
||||
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="nf">read</span><span class="p">(</span><span class="o">&</span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">slot</span><span class="p">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="p">-></span><span class="w"> </span><span class="nb">Box</span><span class="o"><</span><span class="p">[</span><span class="kt">bool</span><span class="p">]</span><span class="o">></span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="bp">self</span><span class="p">.</span><span class="n">row</span><span class="p">(</span><span class="n">slot</span><span class="p">)</span><span class="w"> </span><span class="p">}</span>
|
||||
<span class="p">}</span>
|
||||
</code></pre></div>
|
||||
<hr />
|
||||
<h2 id="aggregation-traits-obicompactvectraits">Aggregation traits — <code>obicompactvec::traits</code></h2>
|
||||
<p><code>PersistentBitMatrix</code> implements two aggregation traits used by <code>LayeredStore<S></code> for cross-layer and cross-partition distance computations.</p>
|
||||
<h3 id="columnweights">ColumnWeights</h3>
|
||||
<div class="highlight"><pre><span></span><code><span class="k">impl</span><span class="w"> </span><span class="n">ColumnWeights</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">PersistentBitMatrix</span><span class="w"> </span><span class="p">{</span>
|
||||
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="nf">col_weights</span><span class="p">(</span><span class="o">&</span><span class="bp">self</span><span class="p">)</span><span class="w"> </span><span class="p">-></span><span class="w"> </span><span class="nc">Array1</span><span class="o"><</span><span class="kt">u64</span><span class="o">></span><span class="w"> </span><span class="c1">// = self.count_ones()</span>
|
||||
<span class="p">}</span>
|
||||
</code></pre></div>
|
||||
<p><code>col_weights()[c]</code> = number of set bits in column <code>c</code> across all slots.</p>
|
||||
<h3 id="bitpartials">BitPartials</h3>
|
||||
<div class="highlight"><pre><span></span><code><span class="k">impl</span><span class="w"> </span><span class="n">BitPartials</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">PersistentBitMatrix</span><span class="w"> </span><span class="p">{</span>
|
||||
<span class="w"> </span><span class="c1">// Self-contained partials (additive across layers)</span>
|
||||
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="nf">partial_jaccard</span><span class="p">(</span><span class="o">&</span><span class="bp">self</span><span class="p">)</span><span class="w"> </span><span class="p">-></span><span class="w"> </span><span class="p">(</span><span class="n">Array2</span><span class="o"><</span><span class="kt">u64</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">Array2</span><span class="o"><</span><span class="kt">u64</span><span class="o">></span><span class="p">)</span><span class="w"> </span><span class="c1">// (inter, union)</span>
|
||||
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="nf">partial_hamming</span><span class="p">(</span><span class="o">&</span><span class="bp">self</span><span class="p">)</span><span class="w"> </span><span class="p">-></span><span class="w"> </span><span class="nc">Array2</span><span class="o"><</span><span class="kt">u64</span><span class="o">></span><span class="w"> </span><span class="c1">// differing bits</span>
|
||||
|
||||
<span class="w"> </span><span class="c1">// Provided finalisations</span>
|
||||
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="nf">jaccard_dist_matrix</span><span class="p">(</span><span class="o">&</span><span class="bp">self</span><span class="p">)</span><span class="w"> </span><span class="p">-></span><span class="w"> </span><span class="nc">Array2</span><span class="o"><</span><span class="kt">f64</span><span class="o">></span>
|
||||
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="nf">hamming_dist_matrix</span><span class="p">(</span><span class="o">&</span><span class="bp">self</span><span class="p">)</span><span class="w"> </span><span class="p">-></span><span class="w"> </span><span class="nc">Array2</span><span class="o"><</span><span class="kt">u64</span><span class="o">></span>
|
||||
<span class="p">}</span>
|
||||
</code></pre></div>
|
||||
<p><code>partial_jaccard</code> returns <code>(inter, union)</code> as a pair because <code>union</code> is not reconstructible from per-column <code>count_ones()</code> — it depends on both columns simultaneously. Both components are additively decomposable across <code>(partition, layer)</code> pairs; the final <code>jaccard_dist_matrix()</code> is computed from their element-wise sums.</p>
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user