⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
This commit is contained in:
Eric Coissac
2026-04-07 08:36:50 +02:00
parent 670edc1958
commit 8c7017a99d
392 changed files with 18875 additions and 141 deletions
+20
View File
@@ -0,0 +1,20 @@
# `obistats` Package — Semantic Overview
The `obistats` package provides lightweight, general-purpose numerical utilities in Go. It includes:
- **Basic arithmetic helpers**:
- `maxint`, `minint`: return the maximum/minimum of two integers.
- `sumint(xs []int) int`: computes the sum over a slice of integers.
- **Root-finding via bisection**:
- `bisect(...)`: numerically finds a root of a real-valued function within `[low, high]`, using the classical bisection method. Returns `(root, success)`.
- Requires `f(low)` and `f(high)` to have opposite signs; panics otherwise.
- **Boolean bisection**:
- `bisectBool(...)`: locates the transition point where a boolean function flips (e.g., threshold detection). Returns adjacent points `(x1, x2)` straddling the change. Panics if `f(low) == f(high)`.
- **Series summation**:
- `series(...)`: computes the infinite sum ∑ₙ₌₀^∞ f(n) by iterating until convergence (i.e., `y == yp` within floating-point precision).
- *Note*: Fast but may suffer from rounding errors for slowly converging or oscillating series.
All functions are designed for performance and simplicity, with no external dependencies beyond `fmt` (for error messages). The package is a stripped-down copy of internal utilities, likely used in performance-critical or statistical computations.
+33
View File
@@ -0,0 +1,33 @@
# Statistical Functions in `obistats` Package
This Go package provides high-precision statistical functions for probability distributions, particularly the **regularized incomplete beta function**, used in hypothesis testing and confidence interval calculations.
## Core Functions
- **`mathBeta(a, b)`**
Computes the *complete beta function* $ B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} $ using logarithms of the gamma function (`math.Lgamma`) for numerical stability.
- **`lgamma(x)`**
Wrapper around `math.Lgamma`, returning the natural logarithm of the absolute value of the gamma function.
- **`mathBetaInc(x, a, b)`**
Computes the *regularized incomplete beta function* $ I_x(a,b) $. This is essential for computing cumulative distribution functions (CDFs) of the beta, F-, and t-distributions.
- Uses *continued fraction evaluation* (via `betacf`) for accuracy.
- Applies symmetry transformation ($ x \to 1-x $) when beneficial (per Numerical Recipes).
- Returns `NaN` for invalid inputs (`x < 0 || x > 1`).
- **`betacf(x, a, b)`**
Implements the continued fraction expansion of $ I_x(a,b) $.
- Iteratively evaluates recurrence relations for even/odd terms.
- Uses `epsilon = 3e-14` and `maxIterations = 200` for convergence.
- Handles near-zero denominators via `raiseZero`.
## Use Cases
- Statistical hypothesis testing (e.g., Fishers exact test).
- Beta, binomial proportion confidence intervals.
- F-test and Student's t-distribution CDF computations.
## Implementation Notes
Based on *Numerical Recipes in C*, §6.4, with robustness enhancements for floating-point edge cases.
+39
View File
@@ -0,0 +1,39 @@
# Beta-Binomial Distribution Implementation in `obistats`
This Go package provides a complete statistical implementation of the **Beta-Binomial distribution**, a compound discrete probability distribution where the success probability of a Binomial distribution follows a Beta distribution.
## Core Features
- **Struct Definition**:
`BetaBinomial` encapsulates the distribution parameters: number of trials (`N > 0`) and Beta shape parameters `Alpha` and `Beta`, both strictly positive. Optional random source (`Src`) supports reproducible sampling.
- **Probability Mass Function (PMF)**:
- `LogProb(x)` computes the natural logarithm of the PMF at integer `x ∈ [0, N]`.
- `Prob(x)` returns the PMF value via exponentiation.
- **Cumulative Distribution Function (CDF)**:
- `LogCDF(x)` evaluates the log-CDF using an analytical expression involving:
- Log-binomial coefficient (`Lchoose`)
- Log-beta function (`mathext.Lbeta`)
- Generalized hypergeometric function `HypPFQ` (via `scientificgo.org/special`).
- `CDF(x)` returns the standard CDF as `exp(LogCDF(x))`.
- **Statistical Moments**:
- Mean: $N \cdot \frac{\alpha}{\alpha + \beta}$
- Variance: $N \cdot \frac{\alpha \beta (\!N + \alpha + \beta\!)}{(\alpha+\beta)^2 (\alpha+\beta+1)}$
- Standard deviation: square root of variance.
- **Mode**:
Returns the most probable count. Special cases handled:
- `NaN` if both $\alpha, \beta \leq 1$
- $0$ if only $\alpha \leq 1$
- $N$ if only $\beta \leq 1$
- **Utility Methods**:
- `LogCDFTable(x)` builds a cumulative log-probability table up to `x`, useful for fast lookup or numerical stability.
- `NumParameters()` returns the number of distribution parameters (3: $N$, $\alpha$, $\beta$).
- **Input Validation**:
Panics on invalid parameters (non-positive `N`, $\alpha$, or $\beta$), ensuring correctness.
This module supports high-precision statistical computations using specialized mathematical libraries (`gonum.org/v1/gonum/mathext`, `scientificgo.org/special`).
+31
View File
@@ -0,0 +1,31 @@
# `obistats` Package Overview
The `obistats` package provides data structures and utilities for analyzing benchmark results in Go. It enables aggregation, statistical summarization, and comparison of performance metrics across multiple configurations.
## Core Types
- **`Collection`**: Holds benchmark results grouped by configuration, group label (e.g., parameter combinations), and metric unit. It tracks:
- Ordered lists of `Configs`, `Groups`, and `Units`.
- A map from group names to ordered lists of benchmark functions (`Benchmarks`).
- `Metrics`, keyed by `(Config, Group, Benchmark, Unit)`.
- Optional parameters for significance testing (`DeltaTest`, `Alpha`), geometric mean inclusion, and result ordering/splitting.
- **`Key`**: Uniquely identifies a metric for one benchmark run, combining configuration source (`Config`), group label (`Group`), benchmark name (sans `"Benchmark"` prefix), and unit.
- **`Metrics`**: Stores raw (`Values`) and cleaned (`RValues`, with outliers removed via IQR) measurements, plus derived statistics: `Min`, `Mean`, and `Max`.
## Key Functionality
- **Statistical summarization**:
- Outlier removal using Tukeys fences (Q1 ± 1.5×IQR, Q3 + 1.5×IQR).
- Computation of min/mean/max over cleaned data.
- **Formatting helpers**:
- `FormatMean()`: Returns formatted mean (e.g., scaled or raw).
- `FormatDiff()`: Computes and formats symmetric deviation as ±% (based on min/max vs. mean).
- `Format()`: Combines both into `"mean ±diff"` style.
- **Dynamic collection building**:
- `addMetrics()` creates or retrieves metrics for a given key, while maintaining ordered lists of unique configs/groups/units and benchmarks-per-group.
> ⚠️ *Note*: The file includes commented-out methods (`AddFile`, `AddData`, etc.) referencing an external `benchfmt` package—these are placeholders and not part of the active API in this excerpt.
+25
View File
@@ -0,0 +1,25 @@
# Semantic Description of `obistats` Delta Testing Functionality
This Go package (`obistats`) provides statistical tools for comparing performance metrics before and after code changes—typically used in benchmarking workflows.
- **`DeltaTest` type**: A function signature for comparing two `*Metrics` instances (old vs. new), returning a *p*-value (`float64`) and an optional error.
- **Purpose**: Determine whether two sets of samples likely originate from the same underlying distribution (i.e., detect significant performance regressions/improvements).
## Supported Tests
- **`NoDeltaTest()`**: A no-op test returning `(-1, nil)`, indicating *no statistical comparison* is performed.
- **`TTest()`**: Performs a two-sample Welchs *t*-test on `RValues`, assessing whether means differ significantly.
- **`UTest()`**: Applies the MannWhitney *U* test (non-parametric), comparing distributions without assuming normality.
## Common Errors
- `ErrSamplesEqual`: All samples in one or both groups are identical.
- `ErrSampleSize`: Insufficient data points for reliable testing (e.g., < 2).
- `ErrZeroVariance`: One sample set has zero variance (no spread), breaking test assumptions.
- `ErrMismatchedSamples`: Sample lengths differ (not used here but part of the broader API).
## Design Rationale
- Built on top of internal benchmarking infrastructure (see `github.com/golang-design/bench`).
- Designed for modularity: callers can plug in different statistical tests as needed.
- Returns *p*-values directly, enabling threshold-based decision logic (e.g., `if p < 0.05 → alert`).
+34
View File
@@ -0,0 +1,34 @@
# `obistats` Package: K-Means Clustering Implementation
The `obistats` package provides a concurrent, type-generic implementation of the **K-means clustering algorithm** for numerical datasets.
## Core Utilities
- `SquareDist` / `EuclideanDist`: Compute squared and Euclidean distances between vectors (generic over `float64` or `int`).
- `DefaultRG`: Returns a seeded random number generator (`*rand.Rand`) for reproducibility control.
## Data Structure
- `KmeansClustering`: Encapsulates dataset (`*obiutils.Matrix[float64]`), cluster assignments, centers, and metadata (sizes, distances to nearest center).
- Supports dynamic addition of clusters via `AddACenter()`.
## Initialization & Management
- `MakeKmeansClustering`: Initializes the structure with data, number of clusters *k*, and RNG.
- `SetCenterTo`, `AddACenter`: Assign or grow centers; uses **k-means++**-inspired weighted sampling for new centers.
- `ResetEmptyCenters`: Reinitializes empty clusters using distance-weighted sampling.
## Core Algorithm Steps
- `AssignToClass`: Parallel assignment of points to nearest centers (uses goroutines + mutex).
- `ComputeCenters`: Computes new cluster centroids *as the closest original data point* to the arithmetic mean (robust for non-Euclidean spaces).
- `Run`: Executes iterative refinement until convergence (`max_cycle` iterations or inertia drop ≤ threshold).
## Accessors & Diagnostics
- `K()`, `N()`, `Dimension()`: Return number of clusters, dataset size, and feature dimension.
- `Inertia()`: Sum of squared distances to assigned centers (convergence metric).
- `Centers`, `Classes`, `Sizes`: Expose internal clustering state.
## Design Highlights
- Fully concurrent (goroutine-based) for performance.
- Generic distance functions support both `int` and `float64`.
- Explicit handling of edge cases (empty clusters, convergence).
- Logging via `logrus` for debugging (`obilog.Warnf`).
> *Note: High-level wrapper functions (e.g., standalone `Kmeans`) are commented out but outline intended API usage.*
@@ -0,0 +1,26 @@
# `BetaKolmogorovDist` Function — Semantic Description
The `obistats.BetaKolmogorovDist` function computes a **goodness-of-fit statistic** between an empirical dataset and the *cumulative distribution* (CDF) of a **Beta probability distribution** with specified parameters `α` and `β`. It implements an adapted version of the **KolmogorovSmirnov (KS) test**, tailored for Beta-distributed theoretical models.
### Key Functionalities:
- **Input**:
- `data []float64`: Empirical sample (assumed sorted if `preordered = true`).
- `alpha`, `beta float64`: Shape parameters of the target Beta distribution.
- **Processing**:
- If not pre-sorted, data is copied and sorted ascendingly.
- For each ordered sample point `v_i`, it accumulates the sum `s = Σ_{j≤i} v_j`.
- Evaluates:
`|CDF_Beta(s; α, β) empirical CDF_i|`, where the *empirical* cumulative probability at rank `i` is approximated as `1/(i+1)` — a common Bayesian/maximum-likelihood estimator (e.g., median-rank).
- Returns the **supremum** of these absolute deviations (i.e., max distance across all points).
### Interpretation:
- A **small value** indicates the empirical cumulative sums align closely with the theoretical Beta CDF.
- A **large value** suggests significant deviation — poor fit of aBeta(α,β) to the data.
- Unlike standard KS tests (which use `i/n`), this uses `1/(i+1)` — suitable for small samples or Bayesian contexts.
### Dependencies:
- Uses `gonum.org/v1/gonum/stat/distuv.Beta` for CDF computation.
- Uses `gonum.org/v1/gonum/floats.Max` for distance extremal computation.
- `sort.Float64s` ensures ordered traversal.
> **Note**: The use of *cumulative sums* (`s`) rather than raw values is unconventional — possibly intended for data representing proportions or waiting times where the *integral* of observations matters.
+37
View File
@@ -0,0 +1,37 @@
# `obistats` Package: Mann-Whitney U-test Implementation
The `obistats` package provides a **non-parametric statistical test** for comparing two independent samples: the **MannWhitney U-test**, also known as the Wilcoxon rank-sum test.
## Core Functionality
- **`MannWhitneyUTest(x1, x2 []float64, alt LocationHypothesis)`**
Performs the test between two samples `x1` and `x2`, under a user-specified alternative hypothesis (`LocationLess`, `LocationDiffers`, or `LocationGreater`).
- Returns a structured result:
- Sample sizes (`N1`, `N2`)
- U statistic (with tie handling: ties contribute 0.5)
- Alternative hypothesis used (`AltHypothesis`)
- Achieved *p*-value (`P`)
## Key Features
- **Non-parametric**: No assumption of normality — suitable for ordinal data or non-Gaussian distributions.
- **Exact vs Approximate**:
- Uses *exact U distribution* for small samples (≤50 without ties, ≤25 with ties).
- Falls back to *normal approximation* for larger samples (with tie and continuity corrections).
- **Tie Handling**:
- Ranks averaged for tied values.
- Tie correction applied in variance estimation.
- **Error Handling**: Returns `ErrSampleSize` (empty input) or `ErrSamplesEqual` (all values identical).
## Implementation Notes
- Uses labeled merge to interleave sorted samples while preserving origin labels.
- Computes U via rank sums: `U1 = R1 n₁(n₁+1)/2`.
- Supports one-tailed and two-tailed tests.
- Includes helper functions: `labeledMerge`, `tieCorrection`.
## References
Mann & Whitney (1947); Klotz (1966).
Efficiency slightly lower than *t*-test on normal data, but more robust to outliers and distributional assumptions.
+27
View File
@@ -0,0 +1,27 @@
# `obistats` Package: Semantic Overview
The `obistats` package provides low-level statistical and combinatorial utilities in pure Go, focusing on numerical robustness and performance.
- **Sign Function (`mathSign`)**
Returns the sign of a `float64`: `-1`, `0`, or `+1`. Handles NaN by returning NaN.
- **Precomputed Factorials (`smallFact`)**
Precomputes factorials from `0!` to `20!` (fits in 64-bit signed integer), enabling fast exact binomial coefficient computation for small `n`.
- **Binomial Coefficient (`mathChoose`)**
Computes $\binom{n}{k}$ efficiently:
- For `n ≤ 20`: uses integer arithmetic (multiplication + division) for exact results.
- For larger `n`: leverages logarithms via `mathLchoose` and exponentiates (`exp(log(Choose))`) to avoid overflow.
- **Log-Binomial Coefficient (`mathLchoose`)**
Computes $\log \binom{n}{k}$ via the log-gamma function:
$$\log \binom{n}{k} = \lg(n+1) - \lg(k+1) - \lg(n-k+1)$$
Ensures numerical stability for large `n`, avoiding overflow/underflow.
- **Internal Helper (`lchoose`)**
Core implementation of log-binomial using `math.Lgamma`, reused by both exact and large-scale paths.
**Design Notes**:
- Prioritizes correctness (e.g., NaN propagation, edge-case handling).
- Balances speed and precision: exact integer arithmetic for small inputs; log-space computation for scalability.
- Mirrors functionality from an internal benchmarking module, adapted here as a standalone utility.
+29
View File
@@ -0,0 +1,29 @@
# `obistats` Package — Core Statistical Functions
The `obistats` package provides generic, type-safe implementations of fundamental descriptive statistics for numeric types in Go.
## Key Functions
- **`Max[T]()`**
Returns the maximum value in a slice of numeric types (`int`, `int8``64`, `float32/64`).
*Implementation*: Iterates once, tracking the largest element.
- **`Min[T]()`**
Returns the minimum value in a slice of numeric types (including unsigned integers: `uint`, `uint8``64`).
*Implementation*: Single-pass scan, comparing each element to the current minimum.
- **`Mode[T]()`**
Computes the *most frequent* value (mode) for signed integer types only (`int`, `int8``64`).
*Implementation*: Builds a frequency map, then selects the value with highest count.
## Design Notes
- **Generics**: All functions use Go type parameters (`[T ...]`) for compile-time safety and performance.
- **Type Scope**:
- `Max` supports signed integers + floats (no unsigned).
- `Min` includes all integer variants.
- `Mode` is restricted to signed integers (due to map key constraints and semantics).
- **Assumptions**: Input slices are non-empty; no explicit error handling for edge cases (e.g., empty input).
- **Use Case**: Lightweight, reusable utility functions suitable for statistical pipelines or exploratory data analysis.
> ⚠️ *Note*: No mean, median, variance, or standard deviation functions are provided in this excerpt.
+30
View File
@@ -0,0 +1,30 @@
# `obistats` Package: Normal Distribution Utilities
The `obistats` package provides a lightweight, efficient implementation of the **normal (Gaussian) distribution**, including core statistical operations.
## Core Type
- `NormalDist`: Represents a normal distribution with parameters:
- `Mu` (mean)
- `Sigma` (standard deviation)
## Predefined Constants
- `StdNormal`: A standard normal distribution (`Mu = 0`, `Sigma = 1`).
- `invSqrt2Pi`: Precomputed constant for performance optimization.
## Key Methods
| Method | Description |
|--------|-------------|
| `PDF(x)` | Computes the **probability density function** at point `x`. |
| `pdfEach(xs [])` | Vectorized PDF evaluation over a slice of values (optimized for standard normal). |
| `CDF(x)` | Computes the **cumulative distribution function** at point `x` via error function (`erfc`). |
| `cdfEach(xs [])` | Vectorized CDF evaluation over a slice. |
| `InvCDF(p)` | Computes the **inverse CDF (quantile function)** using Acklams algorithm with refinement. Handles edge cases (`p = 0`, `1`) and numerical stability. |
| `Rand(r *rand.Rand)` | Generates a random sample from the distribution (uses Gos built-in `NormFloat64`). |
| `Bounds()` | Returns a practical support interval: `[Mu 3·Sigma, Mu + 3·Sigma]` (≈99.7% coverage). |
## Implementation Notes
- Optimized paths for standard normal (`Mu = 0`, `Sigma = 1`) reduce computation cost.
- Uses Gos standard math library (`math.Erfc`, `math.Log`, etc.).
- Designed for performance and numerical accuracy in statistical applications.
> *Note: Duplicates functionality from an internal module (`bench`), likely for reuse in public packages.*
+31
View File
@@ -0,0 +1,31 @@
# `obistats.SampleIntWithoutReplacement` — Semantic Description
The function **`SampleIntWithoutReplacement(n, max int) []int`** implements a *random sampling without replacement* algorithm over the integer range `[0, max)`.
## Core Purpose
Generates **`n` distinct integers**, uniformly at random and *without repetition*, from the interval `[0, max)`.
## Algorithmic Strategy
Uses an **incremental reservoir-like mapping** (`draw map[int]int`) to maintain uniqueness:
- Iteratively draws `y = rand.Intn(max)` (i.e., uniform in `[0, max)`).
- If `y` is already present (`ok = true`), it retrieves and reuses the stored value (a *swap trick*).
- Then, `draw[y]` is set to the current upper bound (`max - 1`) and `max` decremented — effectively *removing* one value from the future draw space.
- This preserves uniformity while avoiding collisions, in **O(n)** time and memory.
## Key Properties
- ✅ Guarantees uniqueness: no duplicates in the returned slice.
- ⚖️ Uniform distribution over all possible `n`-element subsets of `[0, max)`.
- 🧠 Space-efficient: uses a map (O(n)) instead of shuffling an array of size `max`.
- 🚀 Efficient for large `max` and moderate `n`, where full-shuffle methods would be wasteful.
## Return Value
A slice of length `n`, containing the sampled integers (order is *not* sorted or deterministic — reflects insertion order in `draw`).
## Typical Use Cases
- Random subset selection (e.g., cross-validation folds, bootstrapping indices).
- shuffling without full permutation.
- Monte Carlo simulations requiring unique random IDs or positions.
## Limitations / Notes
- Assumes `0 ≤ n ≤ max`; behavior is undefined otherwise.
- Relies on the global `math/rand` source (not seeded here); users should call `rand.Seed()` if reproducibility is needed.
+22
View File
@@ -0,0 +1,22 @@
# `obistats` Package: Statistical Utilities for Weighted and Unweighted Samples
The `obistats` package provides a suite of statistical functions for analyzing numeric samples, supporting both unweighted and weighted data. Its core abstraction is the `Sample` struct—encapsulating values (`Xs`), optional weights (`Weights`), and a `Sorted` flag for performance optimization.
### Key Functionalities:
- **Bounds**: Computes min/max efficiently—O(1) when sorted and unweighted; otherwise scans the data.
- **Aggregation**: `Sum()` computes weighted/unweighted sums via incremental accumulation; `Weight()` returns total weight (or count if unweighted).
- **Central Tendency**:
- `Mean()` uses incremental weighted mean for numerical stability.
- `GeoMean()` computes geometric means (requires positive values), also supporting weights.
- **Dispersion**:
- `Variance()` and `StdDev()` compute sample variance/standard deviation (unweighted only; weighted versions raise a panic—*TODO*).
- Based on Welfords online algorithm for numerical robustness.
- **Order Statistics**:
- `Percentile(p)` implements Hyndman & Fans R8 interpolation method (default in many tools). Handles weights via linear scan; constant-time if sorted and unweighted.
- `IQR()` returns interquartile range (`P75 P25`).
- **Utility Methods**:
- `Sort()` sorts in-place (stably for weighted samples) and updates the `Sorted` flag.
- `Copy()` creates a deep copy for independent manipulation.
Designed with performance in mind, the package exploits sorting and incremental algorithms to minimize numerical error and improve runtime—especially valuable for large or repeated analyses. All functions gracefully handle edge cases (empty samples, zero weights) by returning `NaN` or appropriate bounds.
+23
View File
@@ -0,0 +1,23 @@
# `obistats` Package: Semantic Description
The `obistats` package provides utility functions for **formatting and scaling benchmark measurements** in Go, especially tailored for performance benchmarks (e.g., `go test -bench`). Its core component is the **`Scaler` type**, a function that converts raw numeric values into human-readable, unit-aware strings.
- **`Scaler func(float64) string`**: A function type that formats a numeric measurement (e.g., time, memory usage, throughput) into an appropriately scaled and unit-annotated string.
- **`NewScaler(val float64, unit string) Scaler`**: Dynamically selects the best scaling strategy based on:
- The measurement value (`val`)
- Its unit (e.g., `"ns/op"`, `"MB/s"`, `"B/op"`)
It applies **SI prefixes** (`k`/`M`/`G`/`T`) with adaptive precision (02 decimal places) to ensure readability and consistency across table rows.
- **`timeScaler(ns float64)`**: Specialized scaler for time-based units (`ns/op`, `ns/GC`). It selects optimal unit (s, ms, µs, ns) and precision based on magnitude.
- **`hasBaseUnit(s, unit string) bool`**: Helper to detect if a full unit string (e.g., `"bytes/op"`, `"MB/s"`) includes or matches a base unit.
Key features:
- Supports common Go benchmark units: time (`ns/op`), memory (`B/op`, `bytes/op`), throughput (`MB/s`)
- Ensures consistent formatting across rows (e.g., all values in a row use same scale)
- Avoids unnecessary trailing zeros and uses SI conventions
- Designed for compatibility with internal benchmarking infrastructure (originally from `golang-design/bench`)
Intended use: formatting tables of benchmark results where readability and unit consistency are critical.
+24
View File
@@ -0,0 +1,24 @@
# `obistats` Package: Semantic Overview
This Go package provides utilities for sorting benchmark result tables, derived from an internal module. It focuses on semantic ordering of performance data.
## Core Concepts
- **`Order` type**: A function signature defining custom sort logic for table rows (`func(t *Table, i, j int) bool`).
- **Predefined orders**:
- `ByName`: Sorts rows alphabetically by benchmark name.
- `ByDelta`: Orders rows based on magnitude of percentage change (`PctDelta`), adjusted by directionality via `Change`.
- **Helper functions**:
- `Reverse(order Order)`: Returns a new order that inverts the comparison result.
- **Core utility**:
- `Sort(t *Table, order Order)`: Performs an in-place stable sort of table rows using the provided ordering function.
## Design Intent
- Enables flexible, domain-aware sorting (e.g., by performance delta or name).
- Supports both ascending and descending sorts via `Reverse`.
- Uses stable sorting (`sort.SliceStable`) to preserve relative order of equal elements.
## Use Case
Ideal for benchmark comparison tools where users need intuitive, configurable table layouts—especially when analyzing performance regressions or improvements.
+25
View File
@@ -0,0 +1,25 @@
# `obistats` Package — Semantic Overview
The `*obistats*` Go package provides lightweight, type-generic statistical utilities for numerical data.
## Core Functions
- **`Median[T Number](data []T) float64`**
Computes the median of a slice. Internally copies and sorts input data to avoid mutation, handling both even- and odd-length slices correctly. Returns `0` for empty input.
- **`Mean[T Number](data []T) float64`**
Calculates the arithmetic mean by summing all elements (converted to `float64`) and dividing by count.
## Type Constraints
- Uses Go generics (`constraints.Float | constraints.Integer`), enabling use with `int`, `float32`, `float64`, etc.
## Design Notes
- Non-mutating (`Median` works on a copy).
- Simple, efficient implementations—no external dependencies beyond `golang.org/x/exp/constraints` and `slices`.
- Focused on central tendency measures only—no variance, std dev, or distribution stats.
## Use Case
Ideal for small-to-medium numerical datasets where minimal dependencies and clarity are prioritized over advanced statistics.
+31
View File
@@ -0,0 +1,31 @@
# `obistats` Package: Benchmark Statistics and Comparison
The `obistats` package provides semantic tools to analyze, compare, and display benchmark results—typically from Gos `testing.B` benchmarks. It enables structured reporting of performance changes across configurations (e.g., before/after code modifications).
### Core Concepts
- **`Collection`**: Aggregates benchmark metrics across groups, benchmarks, and configurations.
- **`Table` & `Row`**: Represent formatted tabular output for human-readable comparison (e.g., in CLI tools like `benchstat`).
- **Metrics per row**: Include mean, variance, sample size (`n`), and statistical test results.
### Key Functionalities
- **Statistical summarization**: Computes means, variances, and other stats via `computeStats()`.
- **Delta comparison** (2-config mode):
- Performs statistical tests (`UTest` by default) to assess significance.
- Calculates percent change: `((new/old) 1) × 100%`.
- Marks improvements (`+1`) or regressions (`1`), respecting metric semantics (e.g., lower time/op is better; higher MB/s is better).
- **Handling edge cases**:
- Skips rows with missing data (e.g., one config absent).
- Notes issues: zero variance, insufficient samples, or identical values.
- **Geometric mean aggregation**:
- Adds a `[Geo mean]` row summarizing overall performance across benchmarks.
- Excludes zero-mean entries to avoid distortion (e.g., allocations of `0`).
- **Metric normalization**:
- Maps raw units (`ns/op`, `B/op`) to semantic names (e.g., `"time/op"`, `"alloc/op"`).
- Supports prefixed units (`foo-ns/op``foo-time/op`).
### Output Customization
- Supports sorting via user-defined order (`c.Order`).
- Configurable significance level `α` (default: 0.05) for p-value filtering.
- Optional geomean inclusion (`c.AddGeoMean`).
Designed for integration into benchmark analysis pipelines (e.g., CLI tools), `obistats` focuses on **semantic clarity**, **statistical rigor**, and **actionable insights**.
+30
View File
@@ -0,0 +1,30 @@
# `obistats.TDist`: Student's *t*-Distribution Implementation
This Go package provides a lightweight implementation of the **Students *t*-distribution**, commonly used in statistical inference (e.g., hypothesis testing, confidence intervals) when sample sizes are small or population variance is unknown.
## Core Components
- **`TDist` struct**:
Represents a *t*-distribution parameterized by degrees of freedom `V`.
- **`PDF(x)` method**:
Computes the *probability density function* at point `x`, using:
$$
f(x) = \frac{\Gamma\left(\frac{V+1}{2}\right)}{\sqrt{V\pi} \, \Gamma\left(\frac{V}{2}\right)}
\left(1 + \frac{x^2}{V} \right)^{-\frac{V+1}{2}}
$$
Leverages `lgamma` for numerical stability in Gamma function evaluation.
- **`CDF(x)` method**:
Computes the *cumulative distribution function*:
- Returns `0.5` at symmetry point (`x == 0`);
- Uses the **regularized incomplete beta function** `mathBetaInc` for `x > 0`;
- Exploits symmetry: `CDF(-x) = 1 CDF(x)` for `x < 0`.
- **`Bounds()` method**:
Returns a practical truncation interval `[-4, 4]`, sufficient for most visualizations or numerical integration over the central mass of the distribution.
## Dependencies & Notes
- Relies on standard library `math` and custom/internal helpers (`lgamma`, `mathBetaInc`) — likely from a shared internal module.
- Designed for performance and numerical robustness, suitable in statistical tooling or benchmark analysis (as suggested by the `obistats` package name and reference to a bench-related repo).
+37
View File
@@ -0,0 +1,37 @@
# Statistical Hypothesis Testing Module (`obistats`)
This Go package provides implementations of common **t-tests** for comparing sample means under different assumptions. It supports one- and two-sample tests, paired or unpaired designs.
## Core Types
- **`TTestResult`**: Encapsulates the outcome of a t-test, including:
- Sample sizes (`N1`, `N2`)
- Test statistic value (`T`)
- Degrees of freedom (`DoF`)
- Alternative hypothesis type (`AltHypothesis`: `LocationDiffers`, `LocationLess`, or `LocationGreater`)
- Computed *p*-value (`P`)
- **`TTestSample` interface**: Requires methods `Weight()`, `Mean()`, and `Variance()` — enabling reuse with summary statistics.
## Supported Tests
1. **`TwoSampleTTest(x1, x2)`**
Standard Students *t*-test for two independent samples assuming **equal variances** and normality.
2. **`TwoSampleWelchTTest(x1, x2)`**
Welchs *t*-test for two independent samples **without equal-variance assumption**, using Satterthwaite approximation for degrees of freedom.
3. **`PairedTTest(x1, x2)`**
Paired *t*-test for dependent samples (e.g., before/after), testing mean of differences against `μ0`.
4. **`OneSampleTTest(x)`**
One-sample *t*-test comparing sample mean to a known population mean `μ0`.
## Error Handling
- Returns errors for invalid inputs: zero sample size (`ErrSampleSize`), zero variance (`ErrZeroVariance`), or mismatched paired sample lengths (`ErrMismatchedSamples`).
## Implementation Notes
- *p*-values are computed using the cumulative distribution function (CDF) of the Students *t*-distribution.
- Designed for statistical rigor and modularity, reusing internal utilities (e.g., `Mean`, `StdDev`) from a shared module.
+39
View File
@@ -0,0 +1,39 @@
# Mann-Whitney U Distribution Implementation in `obistats`
The `obistats` package provides efficient computation of the **Mann-Whitney U distribution**, used in nonparametric hypothesis testing to compare two independent samples.
## Core Types
- **`UDist`**: Represents the discrete probability distribution of the U statistic for sample sizes `N1`, `N2`. It optionally handles **ties** via a tie-count vector `T`.
## Key Features
-**Exact distribution computation**, both with and without ties.
- *No ties*: Uses dynamic programming (MannWhitney recurrence) in `O(N1·N2·U)` time.
- *With ties*: Implements the linked-list-based algorithm from Cheung & Klotz (1997) via memoization (`makeUmemo`).
-**PMF & CDF evaluation**:
- `PMF(U)` returns the probability mass at U.
- `CDF(U)` computes cumulative probabilities using symmetry to minimize computation.
-**Support for tied ranks**:
- `T` encodes tie multiplicities per rank; if nil, no ties are assumed.
-**Optimized recurrence**:
- Exploits symmetry (`p_{n,m} = p_{m,n}`) and incremental DP to reduce memory/time.
-**Boundary handling**:
- `Bounds()` returns support `[0, N1·N2]`.
- `Step() = 0.5`, reflecting Us discrete unit in tied cases.
## Algorithm Notes
- `p(U)` uses a 2D DP table (rows = *n*, columns = U), computing only necessary states.
- `makeUmemo` builds a 3D memoization table (`k`, `n1`, `2U`) for tied distributions.
- Performance bottlenecks noted in comments (e.g., map overhead) suggest future optimization paths.
## Use Case
Enables exact *p*-value calculation for the **Mann-Whitney U test**, especially valuable when:
- Sample sizes are small-to-moderate (exact methods needed).
- Data contain ties.
+21
View File
@@ -0,0 +1,21 @@
# Semantic Description of `obistats` Package
The `obistats` package provides numerically stable statistical utilities for combinatorics and log-space arithmetic, primarily intended for use in bioinformatics or probabilistic modeling.
- **`Lchoose(n, x int) float64`**:
Computes the natural logarithm of the binomial coefficient "n choose x" using the log-gamma function (`math.Lgamma`). This avoids overflow/underflow inherent in direct computation of large factorials.
- **`Choose(n, x int) float64`**:
Returns the (floating-point approximation of the) binomial coefficient by exponentiating `Lchoose`. *Note*: The argument order in the implementation (`math.Exp(Lchoose(x,n))`) appears reversed—likely a typo; should be `Lchoose(n,x)`.
- **`LogAddExp(x, y float64) float64`**:
Computes `log(exp(x) + exp(y))` in a numerically stable way. Uses the identity:
`log(eˣ + eʸ) = max(x, y) + log(1 + exp(-|x - y|))`, implemented via `math.Log1p` for precision near zero.
Handles NaNs/infinities with logging and fallback.
All functions rely on `math` for core operations, and use Logrus (`log.Errorf`) to warn about invalid inputs (e.g., non-finite values).
Use cases include:
- Exact p-value computation in overrepresentation tests (e.g., hypergeometric),
- Log-probability accumulation in hidden Markov models or Bayesian networks,
- Stable mixture model likelihood evaluations.