dfa0b2bac2
Introduces a `utils` CLI subcommand to enable in-place genome label renaming without full reindexing. Adds strict label validation to reject empty strings, filesystem separators, and control characters, ensuring safe CSV serialization. Updates index metadata, renames corresponding spectrum JSON files, and registers the command in the main dispatch logic. CLI reference documentation is also updated.
1.8 KiB
1.8 KiB
obikmer
obikmer is a Rust tool for manipulation, counting, indexing, and set operations on DNA sequences represented as kmer sets.
Subcommands
| Subcommand | Purpose |
|---|---|
superkmer |
Extract super-kmers from a sequence file and write to stdout |
index |
Build a complete genome index (scatter → dereplicate → count → layered MPHF) |
merge |
Merge multiple built indexes into one |
rebuild |
Filter and compact an existing index into a new single-layer index |
query |
Query an index with sequences and annotate matches |
dump |
Dump all indexed kmers as CSV (kmer + per-genome counts or presence) |
annotate |
Add or update genome metadata from a CSV file; or dump metadata as CSV |
distance |
Compute pairwise distance matrix between genomes; optionally build NJ/UPGMA trees |
unitig |
Dump unitigs from a built index to stdout (debug) |
estimate |
Estimate approximate-index parameters (z, evidence bits, FP rates) before indexing |
reindex |
Convert an index's evidence in-place: exact ↔ approx |
utils |
Miscellaneous index utilities: --new-label NEW=OLD renames a genome label in-place |
Constraints
- Target scale: individual genome datasets, tens of Gbases
- Maximum efficiency in computation, memory, and disk usage
- k odd, k ∈ [11, 31], fixed at runtime; kmer fits in a u64 (2 bits/base)
- Canonical form:
min(kmer, revcomp(kmer))reduces strand-symmetric space by half - Input formats: FASTA, FASTQ, gzip, streaming stdin;
indexreads from stdin automatically when no input files are provided (-can also be passed explicitly among other paths)
Priority operations
- Kmer counting (frequencies)
- Fast search / query
- Set operations: union, intersection, difference