docs: clarify query pipeline, Findere trick, and input formats
Fix a stray prefix in the README heading and update documentation to reflect the query pipeline's operation on `s-mers` (`s = k - z + 1`) with post-partition z-window filtering. Clarify the Findere trick, including k-mer size reduction, consecutive match requirements, and false positive rate calculations. Additionally, expand input format documentation to cover supported file extensions, gzip compression, recursive directory handling, and `query` command specifications.
This commit is contained in:
@@ -35,15 +35,18 @@ stored at `s` belongs to the legitimate k-mer at that slot. The FP event is:
|
||||
P(FP per k-mer) = 1 / 2^b
|
||||
```
|
||||
|
||||
The Findere trick raises the effective window to z consecutive k-mers. A query
|
||||
succeeds only when all z fingerprint checks pass, reducing the per-window FP rate:
|
||||
The Findere trick reduces the indexed k-mer size. When the user specifies k_user
|
||||
and z, the index physically stores k-mers of size `s = k_user − z + 1`. At query
|
||||
time, the same s-mer size is used. After collecting per-position s-mer results
|
||||
over the full query sequence, a sliding window of size z aggregates z consecutive
|
||||
s-mer hits into one confirmed k_user-mer hit, reducing the per-window FP rate:
|
||||
|
||||
```
|
||||
P(FP per z-window) = 1 / 2^(b·z)
|
||||
P(FP per k_user-mer) = 1 / 2^(b·z)
|
||||
```
|
||||
|
||||
The effective indexed k-mer length is `k − z + 1`: a query for a (k+z−1)-mer
|
||||
decomposes into z overlapping k-mers, all of which must match.
|
||||
`IndexConfig::kmer_size` stores `s = k_user − z + 1`, not k_user. Both indexing
|
||||
and querying use this stored size via `set_k(idx.kmer_size())`.
|
||||
|
||||
Parameters b and z are stored in `layer_meta.json` (`EvidenceKind::Approx { b, z }`).
|
||||
|
||||
@@ -167,12 +170,12 @@ any index. It accepts the same `--evidence-bits`, `-z`, and `--fp` flags and
|
||||
additionally accepts `-k` to display the effective indexed k-mer length:
|
||||
|
||||
```
|
||||
k (query): 31
|
||||
k (indexed): 31
|
||||
z: 1
|
||||
k (user): 31
|
||||
k (indexed, s=k-z+1): 27
|
||||
z: 5
|
||||
evidence bits (b): 8
|
||||
FP per k-mer: 3.906e-3 (1/2^8)
|
||||
FP per z-window: 3.906e-3 (1/2^8)
|
||||
FP per s-mer: 3.906e-3 (1/2^8)
|
||||
FP per k-mer window: 9.537e-7 (1/2^(8·5))
|
||||
```
|
||||
|
||||
Useful for choosing parameters before committing to an index build.
|
||||
|
||||
Reference in New Issue
Block a user