docs: clarify MPHF indexing, storage layout, and distance traits

Formalize the two-phase MPHF indexing architecture and update Phase 6 to use `evidence.bin` for direct kmer extraction. Simplify the evidence and unitig storage layouts to flat packed formats enabling O(1) random access. Introduce aggregation traits (`ColumnWeights`, `CountPartials`, `BitPartials`) to support additive distance metric decomposition across partitions. Narrow the documented scope from metagenomic to individual genome datasets, and replace speculative open questions with concrete implementation specifications.
2026-05-17 10:20:22 +08:00
parent cf693f17f2
commit f36b095ce2
17 changed files with 916 additions and 1031 deletions
@@ -978,7 +978,7 @@
 <p><code>obikmer</code> is a Rust tool for manipulation, counting, indexing, and set operations on DNA sequences represented as kmer sets.</p>
 <h2 id="constraints">Constraints</h2>
 <ul>
-<li>Target scale: metagenomic data, tens of Gbases, billions of kmers</li>
+<li>Target scale: individual genome datasets, tens of Gbases</li>
 <li>Maximum efficiency in computation, memory, and disk usage</li>
 <li>Input formats: FASTA, FASTQ, gzip, streaming stdin</li>
 </ul>