Given a set of query sequences, determine for each sequence how many of its k-mers are found in the index and, for each indexed genome, how many k-mers match. The query system is the foundation for read classification and sequence-to-genome mapping.
---
## Input
- Query sequences in FASTA or FASTQ format (gzip supported, streaming stdin supported).
- Sequences shorter than k bases are silently skipped.
- Non-ACGT characters are handled by the superkmer decomposition layer: they act as hard breaks, producing shorter superkmers (identical to the behaviour at indexing time).
---
## Algorithm
The query follows the same superkmer-based partitioning strategy used at indexing time.
Superkmers that appear more than once in the batch (same sequence or across sequences) are deduplicated: each unique `RoutableSuperKmer` is queried once per partition, and the result is broadcast to every `SKDesc` entry that references it.
Parallelism is **not yet active** in the current implementation: batches are processed sequentially on a single thread despite the `--threads` flag being parsed. The `QueryBatch` / `split_by_partition` design is structured to support per-partition parallelism in a future iteration.
For approximate and hybrid index modes, a raw k-mer hit is a positive from the fingerprint test with false-positive rate 1/2^b. The Findere scheme reduces the effective FP rate to 1/2^(b·z) by requiring **z consecutive k-mers** to all test positive before any of them is counted as a confirmed match.
For each genome g, a position i is confirmed if there exists at least one window of z consecutive positions `[j, j+z)` that contains i and where all z positions are present for g (i.e. `results[pos]` is `Some(row)` and `row[g] > 0`). The implementation uses a single O(n) sliding-window scan per genome.
Unconfirmed positions are zeroed in the returned row. If all genomes are zeroed for a position, it is returned as `None`.
**Short superkmers**: when a superkmer contains fewer than z k-mers, no complete z-window can be formed. Rather than discarding all results, `apply_findere` returns them unchanged (no filter applied). This avoids penalising legitimate short sequences near read ends.
The `-z` CLI option overrides the index metadata value. A higher z increases stringency (lower FP, some true positives may be discarded at sequence ends); a lower z increases sensitivity.
---
## Layer lookup: `MphfLayer::find`
`MphfLayer::open(dir, mode: &IndexMode)` receives the mode from `PartitionMeta` — no per-layer file is read. The caller (`QueryLayer`) never chooses the dispatch path: it is fixed at open time by `LayerEvidence`. See [obilayeredmap](../implementation/obilayeredmap.md) for the full `find` / `find_strict` API.
`presence` is true when `--force-presence` is set or when the index has no counts (`!with_counts`). The default `presence_threshold` is 1, so any nonzero count counts as a match.
When `--detail` is requested, a 3-D accumulator `cov[seq_idx][genome][kmer_pos]` is allocated before the partition loop, with dimensions derived from `batch.n_kmers`:
Coverage reflects confirmed k-mers only (post-Findere). The vectors are emitted in the JSON annotation under the key `"coverage"`.
---
## `kmer_missing` semantics
`kmer_missing` counts k-mers that returned `None` from the index before Findere filtering — i.e. k-mers truly absent from every layer. K-mers that were found in the index but rejected by the z-window filter are not counted as missing.
`kmer_count + kmer_missing` ≤ total k-mers in the sequence. The gap (if any) corresponds to k-mers found in the index but rejected by the Findere z-window filter.