Files
obitools4/autodoc/docmd/pkg_obistats.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

127 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Here's a **semantically structured Markdown overview** (≤200 lines) of the `obistats` package, documenting **only public-facing functionality**, written in English and grouped by thematic modules:
```markdown
# `obistats` Package — Public API Overview
The `obistats` package delivers lightweight, numerically robust statistical and combinatorial utilities for Go. Designed for performance-critical applications (e.g., benchmarking, bioinformatics), it avoids external dependencies beyond core math libraries and focuses on **accuracy**, **type safety**, and **modularity**.
---
## 🧮 Numerical Utilities
- `maxint`, `minint`: Return the maximum/minimum of two integers.
- `sumint(xs []int) int`: Computes sum over integer slice.
- `bisect(f, low, high float64) (root, success bool)`: Finds root via bisection; requires `f(low)*f(high)<0`.
- `bisectBool(f, low, high int) (x1, x2 int)`: Locates boolean transition point; panics if `f(low)==f(high)`.
- `series(f func(int) float64) (sum, converged bool)`: Infinite sum via convergence detection.
---
## 📊 Descriptive Statistics
- `Max[T constraints.Float|constraints.Integer](data []T) T`: Max value in slice (signed ints/floats).
- `Min[T ...]`: Min over all integer types.
- `Mode[int...]`: Most frequent value in signed int slice (map-based).
---
## 📐 Central Tendency & Dispersion
- `Median[T Number](data []T) float64`: Non-mutating median (copy + sort).
- `Mean[T Number](data []T) float64`: Arithmetic mean.
---
## 📈 Weighted & Unweighted Samples
- `Sample` struct: Encapsulates values, optional weights (`Weights []float64`), and `Sorted bool`.
- Methods:
- `Mean()`, `GeoMean()` (weighted), `Sum()`, `Weight()`
- `Variance()`, `StdDev()` (unweighted only) via Welfords algorithm
- `Percentile(p float64)` (HyndmanFan R8), `IQR()`
- Bounds: min/max (`O(1)` if sorted & unweighted)
---
## 📉 Probability Distributions
- **Beta-Binomial**:
- `LogProb(x)`, ` Prob(x)` (PMF),
- `CDF()`/`LogCDF()`: Analytical via hypergeometric (`HypPFQ`) + log-beta.
- Moments: mean, variance; mode with edge-case handling.
- **Normal**: `Mu`, `Sigma`; methods:
- PDF/CDF/InvCDF (Acklams algorithm), Rand(), `Bounds()`.
- **Student *t***:
- PDF/CDF via log-gamma & regularized incomplete beta (`mathBetaInc`).
- **KolmogorovSmirnov for Beta**:
- `BetaKolmogorovDist(data []float64, α β float64)`: Max deviation between empirical CDF of cumulative sums and theoretical Beta CDF (uses `1/(i+1)` estimator).
---
## 🧪 Statistical Tests
- **Two-sample tests**:
- `TTest()`: Welchs *t*-test (unequal variances).
- `UTest()`: MannWhitney *U* (non-parametric; handles ties, exact for small samples).
- **One-sample/paired**: `TwoSampleTTest`, `PairedTTest`, `OneSampleTTest`.
- All return structured result (`P` p-value, sample sizes, alt. hypothesis).
---
## 📦 Nonparametric Distribution Helpers
- **MannWhitney U distribution (`UDist`)**:
- Exact PMF/CDF via DP (no ties) or CheungKlotz algorithm (with ties).
- `PMF(U)`, `CDF(U)`; supports tie multiplicities.
---
## 🔢 Combinatorics & Log-Space Arithmetic
- `Lchoose(n, x int) float64`: log-binomial coefficient via `math.Lgamma`.
- `Choose(n, x int) float64`: exponentiated log-binomial (note: arg order in impl may be reversed).
- `LogAddExp(x, y float64)`: Stable `log(eˣ + eʸ)` using max+`Log1p`.
---
## 🧩 Random Sampling
- `SampleIntWithoutReplacement(n, max int) []int`:
- Uniform sampling *without replacement* in O(*n*) time/memory.
- Uses reservoir-style mapping with swap trick for uniqueness.
---
## 📊 Benchmark Analysis & Formatting
- **`Collection`, `Table`, `Row`**: Structured aggregation and display of benchmark metrics.
- **Metrics processing**:
- Outlier removal (Tukeys fences), min/mean/max.
- **Delta comparison**:
- `FormatDiff()`: Symmetric ±% deviation; semantic direction (`+1`/`1`).
- `GeoMean()` row for overall summary.
- **Formatting**:
- `Scaler`: SI-scaled unit-aware formatting (e.g., `"1.23 ms/op"`).
- `timeScaler`, unit detection (`hasBaseUnit`).
---
## 🧭 Sorting & Ordering
- `Order` type: Custom row sort function.
- Predefined: `ByName`, `ByDelta`.
- `Sort(t *Table, order Order)`: Stable sort via `sort.SliceStable`.
---
## 🧰 Utility Helpers
- `mathSign(x float64)`: Sign function (`NaN` → `NaN`).
- Precomputed factorials: `smallFact[0..20]`.
> ⚠️ *All functions assume valid inputs; invalid parameters (e.g., `N≤0`, α/β ≤ 0) may panic.*
> ✅ *No mutability of input slices unless explicitly stated (e.g., `Sample.Sort()`).*