obikmer/docmd/index.md

# obikmer

`obikmer` is a Rust tool for manipulation, counting, indexing, and set operations on DNA sequences represented as kmer sets.

## Subcommands

| Subcommand  | Purpose |
|-------------|---------|
| `superkmer` | Extract super-kmers from a sequence file and write to stdout |
| `index`     | Build a complete genome index (scatter → dereplicate → count → layered MPHF) |
| `merge`     | Merge multiple built indexes into one |
| `rebuild`   | Filter and compact an existing index into a new single-layer index |
| `query`     | Query an index with sequences and annotate matches |
| `dump`      | Dump all indexed kmers as CSV (kmer + per-genome counts or presence) |
| `annotate`  | Add or update genome metadata from a CSV file; or dump metadata as CSV |
| `distance`  | Compute pairwise distance matrix between genomes; optionally build NJ/UPGMA trees |
| `unitig`    | Dump unitigs from a built index to stdout (debug) |
| `estimate`  | Estimate approximate-index parameters (z, evidence bits, FP rates) before indexing |
| `reindex`   | Convert an index's evidence in-place: exact ↔ approx |

## Constraints

- Target scale: individual genome datasets, tens of Gbases
- Maximum efficiency in computation, memory, and disk usage
- k odd, k ∈ [11, 31], fixed at runtime; kmer fits in a u64 (2 bits/base)
- Canonical form: `min(kmer, revcomp(kmer))` reduces strand-symmetric space by half
- Input formats: FASTA, FASTQ, gzip, streaming stdin

## Priority operations

- Kmer counting (frequencies)
- Fast search / query
- Set operations: union, intersection, difference