docs: expand kmer indexing, filtering, and merging documentation

Expands MkDocs navigation and documentation for evidence elimination, the merge command, and kmer filtering. Refactors kmer representation to a generic `KmerOf<L>` type with a bitwise reverse complement algorithm. Unifies MPHF construction, introduces approximate fingerprint-based indexing, and updates the pipeline, chunkreader, and storage layouts. Adds code coverage reports and clarifies architectural invariants for improved maintainability.
This commit is contained in:
Eric Coissac
2026-06-04 21:27:01 +02:00
parent 9306ec1c56
commit bb7adc1154
50 changed files with 34226 additions and 1576 deletions
+5 -4
View File
@@ -9,15 +9,16 @@
| `superkmer` | Extract super-kmers from a sequence file and write to stdout |
| `index` | Build a complete genome index (scatter → dereplicate → count → layered MPHF) |
| `merge` | Merge multiple built indexes into one |
| `rebuild` | Filter and compact an existing index into a new single-layer index |
| `rebuild` | Filter and compact an existing index into a new single-layer index; supports ingroup/outgroup predicates on genome metadata |
| `query` | Query an index with sequences and annotate matches |
| `dump` | Dump all indexed kmers as CSV (kmer + per-genome counts or presence) |
| `dump` | Dump all indexed k-mers as CSV (kmer + per-genome counts or presence); supports the same ingroup/outgroup filtering as `rebuild` |
| `annotate` | Add or update genome metadata from a CSV file; or dump metadata as CSV |
| `distance` | Compute pairwise distance matrix between genomes; optionally build NJ/UPGMA trees |
| `unitig` | Dump unitigs from a built index to stdout (debug) |
| `unitig` | Build a global de Bruijn graph across all partitions and enumerate its unitigs as FASTA; supports the same ingroup/outgroup filtering as `rebuild` |
| `estimate` | Estimate approximate-index parameters (z, evidence bits, FP rates) before indexing |
| `reindex` | Convert an index's evidence in-place: exact ↔ approx |
| `utils` | Miscellaneous index utilities: `--new-label NEW=OLD` renames a genome label in-place (NEW gets OLD's identity) |
| `utils` | Miscellaneous index utilities: `--new-label NEW=OLD` renames a genome label; `--upgrade-index` adds missing `layer_meta.json` to old indexes |
| `pack` | Pack per-column matrix files into single-file format to reduce query I/O |
## Constraints