Push lrwmyplxxzkn #19

Merged
coissac merged 4 commits from push-lrwmyplxxzkn into main 2026-06-09 08:28:21 +00:00
Showing only changes of commit 7dd8db1409 - Show all commits
+23
View File
@@ -140,6 +140,29 @@ Fractions are computed over the size of the classified group, not over total
genome count. An empty group (no genome classified as ingroup/outgroup) never
triggers a filter failure.
### Conservative rounding of fraction thresholds
When a fraction threshold `F` is applied to a group of size `N`, the effective
integer threshold is determined by the direction of the bound:
| Bound | Effective count | Rounding | Rationale |
|-------|----------------|----------|-----------|
| `--min-frac F` | k-mer in ≥ ⌈F·N⌉ genomes | **ceil** | stricter — a kmer present in exactly ⌊F·N⌋ genomes does not meet the fraction |
| `--max-frac F` | k-mer in ≤ ⌊F·N⌋ genomes | **floor** | stricter — a kmer present in ⌈F·N⌉ genomes already exceeds the fraction |
The same rule applies symmetrically to `--min-outgroup-frac` (ceil) and
`--max-outgroup-frac` (floor). The outgroup direction is not inverted: the
conservative rounding depends only on whether the bound is a minimum or a
maximum, not on which group it applies to.
**Example**`--min-frac 0.5` with an ingroup of 3 genomes:
`⌈0.5 × 3⌉ = ⌈1.5⌉ = 2` → at least 2 of 3 ingroup genomes must carry the k-mer.
**Implementation note** — the filter evaluates `n / denom < min_frac` directly
(integer `n`, float comparison) rather than pre-computing `⌈F·N⌉`. This is
mathematically equivalent for integer counts: `n / N < F``n < F·N`
`n ≤ ⌈F·N⌉ 1``n < ⌈F·N⌉`. No explicit rounding is needed.
## Examples
Keep k-mers specific to *Betula nana* — present in at least 2 *B. nana* genomes