Files
obitools4/autodoc/docmd/pkg/obikmer/minimizer_utils.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

28 lines
1.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Minimizer Size Utilities in `obikmer`
This Go package provides helper functions to compute and validate the **minimizer size** `m` in k-mer-based genomic algorithms (e.g., minimizer schemes for sequence comparison or indexing).
## Core Functions
- **`DefaultMinimizerSize(k)`**
Returns a *recommended* minimizer size: `ceil(k / 2.5)`, clamped to `[1, k1]`.
→ Ensures `m` is reasonably large for uniqueness while keeping window size (`k m + 1`) manageable.
- **`MinMinimizerSize(nworkers)`**
Computes the *minimum* `m` such that there are ≥ `nworkers` distinct minimizers:
solves `4^m ≥ n_workers`, i.e., `ceil(log₄(nworkers))`.
→ Guarantees enough diversity for parallelization (e.g., hashing-based distribution across workers).
- **`ValidateMinimizerSize(m, k, nworkers)`**
Enforces constraints on `m`:
- Lower bound: ≥ `MinMinimizerSize(nworkers)` (warns & adjusts if violated)
- Hard bounds: `1 ≤ m < k`
→ Prevents invalid or inefficient parameter choices.
## Semantic Purpose
These functions ensure that minimizer-based workflows are:
- **Theoretically sound** (sufficient entropy for parallelism),
- **Practically viable** (avoiding degenerate cases like `m = 0` or `m ≥ k`),
- **User-friendly** (providing sensible defaults + clear warnings on adjustment).