Push lrwmyplxxzkn #19
@@ -140,6 +140,29 @@ Fractions are computed over the size of the classified group, not over total
|
||||
genome count. An empty group (no genome classified as ingroup/outgroup) never
|
||||
triggers a filter failure.
|
||||
|
||||
### Conservative rounding of fraction thresholds
|
||||
|
||||
When a fraction threshold `F` is applied to a group of size `N`, the effective
|
||||
integer threshold is determined by the direction of the bound:
|
||||
|
||||
| Bound | Effective count | Rounding | Rationale |
|
||||
|-------|----------------|----------|-----------|
|
||||
| `--min-frac F` | k-mer in ≥ ⌈F·N⌉ genomes | **ceil** | stricter — a kmer present in exactly ⌊F·N⌋ genomes does not meet the fraction |
|
||||
| `--max-frac F` | k-mer in ≤ ⌊F·N⌋ genomes | **floor** | stricter — a kmer present in ⌈F·N⌉ genomes already exceeds the fraction |
|
||||
|
||||
The same rule applies symmetrically to `--min-outgroup-frac` (ceil) and
|
||||
`--max-outgroup-frac` (floor). The outgroup direction is not inverted: the
|
||||
conservative rounding depends only on whether the bound is a minimum or a
|
||||
maximum, not on which group it applies to.
|
||||
|
||||
**Example** — `--min-frac 0.5` with an ingroup of 3 genomes:
|
||||
`⌈0.5 × 3⌉ = ⌈1.5⌉ = 2` → at least 2 of 3 ingroup genomes must carry the k-mer.
|
||||
|
||||
**Implementation note** — the filter evaluates `n / denom < min_frac` directly
|
||||
(integer `n`, float comparison) rather than pre-computing `⌈F·N⌉`. This is
|
||||
mathematically equivalent for integer counts: `n / N < F` ↔ `n < F·N` ↔
|
||||
`n ≤ ⌈F·N⌉ − 1` ↔ `n < ⌈F·N⌉`. No explicit rounding is needed.
|
||||
|
||||
## Examples
|
||||
|
||||
Keep k-mers specific to *Betula nana* — present in at least 2 *B. nana* genomes
|
||||
|
||||
Reference in New Issue
Block a user