docs: clarify MPHF indexing, storage layout, and distance traits

Formalize the two-phase MPHF indexing architecture and update Phase 6 to use `evidence.bin` for direct kmer extraction. Simplify the evidence and unitig storage layouts to flat packed formats enabling O(1) random access. Introduce aggregation traits (`ColumnWeights`, `CountPartials`, `BitPartials`) to support additive distance metric decomposition across partitions. Narrow the documented scope from metagenomic to individual genome datasets, and replace speculative open questions with concrete implementation specifications.
2026-05-17 10:20:22 +08:00
parent cf693f17f2
commit f36b095ce2
17 changed files with 916 additions and 1031 deletions
@@ -4,7 +4,7 @@

 ## Constraints

- Target scale: metagenomic data, tens of Gbases, billions of kmers
+- Target scale: individual genome datasets, tens of Gbases
 - Maximum efficiency in computation, memory, and disk usage
 - Input formats: FASTA, FASTQ, gzip, streaming stdin