Files
obikmer/docmd/index.md
T
Eric Coissac dfa0b2bac2 feat: add utils subcommand for renaming genome labels
Introduces a `utils` CLI subcommand to enable in-place genome label renaming without full reindexing. Adds strict label validation to reject empty strings, filesystem separators, and control characters, ensuring safe CSV serialization. Updates index metadata, renames corresponding spectrum JSON files, and registers the command in the main dispatch logic. CLI reference documentation is also updated.
2026-05-26 15:35:22 +02:00

1.8 KiB

obikmer

obikmer is a Rust tool for manipulation, counting, indexing, and set operations on DNA sequences represented as kmer sets.

Subcommands

Subcommand Purpose
superkmer Extract super-kmers from a sequence file and write to stdout
index Build a complete genome index (scatter → dereplicate → count → layered MPHF)
merge Merge multiple built indexes into one
rebuild Filter and compact an existing index into a new single-layer index
query Query an index with sequences and annotate matches
dump Dump all indexed kmers as CSV (kmer + per-genome counts or presence)
annotate Add or update genome metadata from a CSV file; or dump metadata as CSV
distance Compute pairwise distance matrix between genomes; optionally build NJ/UPGMA trees
unitig Dump unitigs from a built index to stdout (debug)
estimate Estimate approximate-index parameters (z, evidence bits, FP rates) before indexing
reindex Convert an index's evidence in-place: exact ↔ approx
utils Miscellaneous index utilities: --new-label NEW=OLD renames a genome label in-place

Constraints

  • Target scale: individual genome datasets, tens of Gbases
  • Maximum efficiency in computation, memory, and disk usage
  • k odd, k ∈ [11, 31], fixed at runtime; kmer fits in a u64 (2 bits/base)
  • Canonical form: min(kmer, revcomp(kmer)) reduces strand-symmetric space by half
  • Input formats: FASTA, FASTQ, gzip, streaming stdin; index reads from stdin automatically when no input files are provided (- can also be passed explicitly among other paths)

Priority operations

  • Kmer counting (frequencies)
  • Fast search / query
  • Set operations: union, intersection, difference