Push lrwmyplxxzkn #19
@@ -140,6 +140,29 @@ Fractions are computed over the size of the classified group, not over total
|
|||||||
genome count. An empty group (no genome classified as ingroup/outgroup) never
|
genome count. An empty group (no genome classified as ingroup/outgroup) never
|
||||||
triggers a filter failure.
|
triggers a filter failure.
|
||||||
|
|
||||||
|
### Conservative rounding of fraction thresholds
|
||||||
|
|
||||||
|
When a fraction threshold `F` is applied to a group of size `N`, the effective
|
||||||
|
integer threshold is determined by the direction of the bound:
|
||||||
|
|
||||||
|
| Bound | Effective count | Rounding | Rationale |
|
||||||
|
|-------|----------------|----------|-----------|
|
||||||
|
| `--min-frac F` | k-mer in ≥ ⌈F·N⌉ genomes | **ceil** | stricter — a kmer present in exactly ⌊F·N⌋ genomes does not meet the fraction |
|
||||||
|
| `--max-frac F` | k-mer in ≤ ⌊F·N⌋ genomes | **floor** | stricter — a kmer present in ⌈F·N⌉ genomes already exceeds the fraction |
|
||||||
|
|
||||||
|
The same rule applies symmetrically to `--min-outgroup-frac` (ceil) and
|
||||||
|
`--max-outgroup-frac` (floor). The outgroup direction is not inverted: the
|
||||||
|
conservative rounding depends only on whether the bound is a minimum or a
|
||||||
|
maximum, not on which group it applies to.
|
||||||
|
|
||||||
|
**Example** — `--min-frac 0.5` with an ingroup of 3 genomes:
|
||||||
|
`⌈0.5 × 3⌉ = ⌈1.5⌉ = 2` → at least 2 of 3 ingroup genomes must carry the k-mer.
|
||||||
|
|
||||||
|
**Implementation note** — the filter evaluates `n / denom < min_frac` directly
|
||||||
|
(integer `n`, float comparison) rather than pre-computing `⌈F·N⌉`. This is
|
||||||
|
mathematically equivalent for integer counts: `n / N < F` ↔ `n < F·N` ↔
|
||||||
|
`n ≤ ⌈F·N⌉ − 1` ↔ `n < ⌈F·N⌉`. No explicit rounding is needed.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
Keep k-mers specific to *Betula nana* — present in at least 2 *B. nana* genomes
|
Keep k-mers specific to *Betula nana* — present in at least 2 *B. nana* genomes
|
||||||
|
|||||||
Reference in New Issue
Block a user