docs: clarify MPHF indexing, storage layout, and distance traits
Formalize the two-phase MPHF indexing architecture and update Phase 6 to use `evidence.bin` for direct kmer extraction. Simplify the evidence and unitig storage layouts to flat packed formats enabling O(1) random access. Introduce aggregation traits (`ColumnWeights`, `CountPartials`, `BitPartials`) to support additive distance metric decomposition across partitions. Narrow the documented scope from metagenomic to individual genome datasets, and replace speculative open questions with concrete implementation specifications.
This commit is contained in:
+1
-1
@@ -4,7 +4,7 @@
|
||||
|
||||
## Constraints
|
||||
|
||||
- Target scale: metagenomic data, tens of Gbases, billions of kmers
|
||||
- Target scale: individual genome datasets, tens of Gbases
|
||||
- Maximum efficiency in computation, memory, and disk usage
|
||||
- Input formats: FASTA, FASTQ, gzip, streaming stdin
|
||||
|
||||
|
||||
Reference in New Issue
Block a user