Files
obikmer/docmd/index.md
T
Eric Coissac f36b095ce2 docs: clarify MPHF indexing, storage layout, and distance traits
Formalize the two-phase MPHF indexing architecture and update Phase 6 to use `evidence.bin` for direct kmer extraction. Simplify the evidence and unitig storage layouts to flat packed formats enabling O(1) random access. Introduce aggregation traits (`ColumnWeights`, `CountPartials`, `BitPartials`) to support additive distance metric decomposition across partitions. Narrow the documented scope from metagenomic to individual genome datasets, and replace speculative open questions with concrete implementation specifications.
2026-05-17 15:59:10 +08:00

16 lines
452 B
Markdown

# obikmer
`obikmer` is a Rust tool for manipulation, counting, indexing, and set operations on DNA sequences represented as kmer sets.
## Constraints
- Target scale: individual genome datasets, tens of Gbases
- Maximum efficiency in computation, memory, and disk usage
- Input formats: FASTA, FASTQ, gzip, streaming stdin
## Priority operations
- Kmer counting (frequencies)
- Fast search / query
- Set operations: union, intersection, difference