f36b095ce2
Formalize the two-phase MPHF indexing architecture and update Phase 6 to use `evidence.bin` for direct kmer extraction. Simplify the evidence and unitig storage layouts to flat packed formats enabling O(1) random access. Introduce aggregation traits (`ColumnWeights`, `CountPartials`, `BitPartials`) to support additive distance metric decomposition across partitions. Narrow the documented scope from metagenomic to individual genome datasets, and replace speculative open questions with concrete implementation specifications.
16 lines
452 B
Markdown
16 lines
452 B
Markdown
# obikmer
|
|
|
|
`obikmer` is a Rust tool for manipulation, counting, indexing, and set operations on DNA sequences represented as kmer sets.
|
|
|
|
## Constraints
|
|
|
|
- Target scale: individual genome datasets, tens of Gbases
|
|
- Maximum efficiency in computation, memory, and disk usage
|
|
- Input formats: FASTA, FASTQ, gzip, streaming stdin
|
|
|
|
## Priority operations
|
|
|
|
- Kmer counting (frequencies)
|
|
- Fast search / query
|
|
- Set operations: union, intersection, difference
|