Files
obikmer/docmd/index.md
T
Eric Coissac 036d044291 refactor: update core types and add approximate evidence support
Refactor `Kmer`, `SuperKmer`, and chunk reader into optimized, generic representations with compile-time length parameters and bitwise operations. Update the pipeline and scheduler to support batch processing, 1→N flat transformations, and multi-source merging. Introduce an approximate evidence mode using b-bit fingerprints and `.idx` files, alongside existing exact mode. Update CLI documentation, minimizer selection, and query output schema accordingly.
2026-05-26 10:04:25 +02:00

1.6 KiB

obikmer

obikmer is a Rust tool for manipulation, counting, indexing, and set operations on DNA sequences represented as kmer sets.

Subcommands

Subcommand Purpose
superkmer Extract super-kmers from a sequence file and write to stdout
index Build a complete genome index (scatter → dereplicate → count → layered MPHF)
merge Merge multiple built indexes into one
rebuild Filter and compact an existing index into a new single-layer index
query Query an index with sequences and annotate matches
dump Dump all indexed kmers as CSV (kmer + per-genome counts or presence)
annotate Add or update genome metadata from a CSV file; or dump metadata as CSV
distance Compute pairwise distance matrix between genomes; optionally build NJ/UPGMA trees
unitig Dump unitigs from a built index to stdout (debug)
estimate Estimate approximate-index parameters (z, evidence bits, FP rates) before indexing
reindex Convert an index's evidence in-place: exact ↔ approx

Constraints

  • Target scale: individual genome datasets, tens of Gbases
  • Maximum efficiency in computation, memory, and disk usage
  • k odd, k ∈ [11, 31], fixed at runtime; kmer fits in a u64 (2 bits/base)
  • Canonical form: min(kmer, revcomp(kmer)) reduces strand-symmetric space by half
  • Input formats: FASTA, FASTQ, gzip, streaming stdin

Priority operations

  • Kmer counting (frequencies)
  • Fast search / query
  • Set operations: union, intersection, difference