docs: clarify query pipeline, Findere trick, and input formats

Fix a stray prefix in the README heading and update documentation to reflect the query pipeline's operation on `s-mers` (`s = k - z + 1`) with post-partition z-window filtering. Clarify the Findere trick, including k-mer size reduction, consecutive match requirements, and false positive rate calculations. Additionally, expand input format documentation to cover supported file extensions, gzip compression, recursive directory handling, and `query` command specifications.
2026-05-30 15:54:13 +02:00
parent 708b0abf9b
commit 8a0b898b4b
4 changed files with 150 additions and 36 deletions
@@ -25,7 +25,8 @@
 - Maximum efficiency in computation, memory, and disk usage
 - k odd, k ∈ [11, 31], fixed at runtime; kmer fits in a u64 (2 bits/base)
 - Canonical form: `min(kmer, revcomp(kmer))` reduces strand-symmetric space by half
- Input formats: FASTA, FASTQ, gzip, streaming stdin; `index` reads from stdin automatically when no input files are provided (`-` can also be passed explicitly among other paths)
+- Input formats for `index`/`superkmer`: FASTA (`.fa`, `.fasta`), FASTQ (`.fq`, `.fastq`), GenBank flat file (`.gb`, `.gbk`, `.gbff`), all optionally gzip-compressed; directories expanded recursively; streaming stdin via `-`
+- Input formats for `query`: FASTA, FASTQ, optionally gzip-compressed; streaming stdin via `-`

 ## Parameter constraints (enforced at CLI)