docs: clarify query pipeline, Findere trick, and input formats
Fix a stray prefix in the README heading and update documentation to reflect the query pipeline's operation on `s-mers` (`s = k - z + 1`) with post-partition z-window filtering. Clarify the Findere trick, including k-mer size reduction, consecutive match requirements, and false positive rate calculations. Additionally, expand input format documentation to cover supported file extensions, gzip compression, recursive directory handling, and `query` command specifications.
This commit is contained in:
+2
-1
@@ -25,7 +25,8 @@
|
||||
- Maximum efficiency in computation, memory, and disk usage
|
||||
- k odd, k ∈ [11, 31], fixed at runtime; kmer fits in a u64 (2 bits/base)
|
||||
- Canonical form: `min(kmer, revcomp(kmer))` reduces strand-symmetric space by half
|
||||
- Input formats: FASTA, FASTQ, gzip, streaming stdin; `index` reads from stdin automatically when no input files are provided (`-` can also be passed explicitly among other paths)
|
||||
- Input formats for `index`/`superkmer`: FASTA (`.fa`, `.fasta`), FASTQ (`.fq`, `.fastq`), GenBank flat file (`.gb`, `.gbk`, `.gbff`), all optionally gzip-compressed; directories expanded recursively; streaming stdin via `-`
|
||||
- Input formats for `query`: FASTA, FASTQ, optionally gzip-compressed; streaming stdin via `-`
|
||||
|
||||
## Parameter constraints (enforced at CLI)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user