Files
obitools4/autodoc/docmd/pkg/obistats/udist.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

40 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Mann-Whitney U Distribution Implementation in `obistats`
The `obistats` package provides efficient computation of the **Mann-Whitney U distribution**, used in nonparametric hypothesis testing to compare two independent samples.
## Core Types
- **`UDist`**: Represents the discrete probability distribution of the U statistic for sample sizes `N1`, `N2`. It optionally handles **ties** via a tie-count vector `T`.
## Key Features
-**Exact distribution computation**, both with and without ties.
- *No ties*: Uses dynamic programming (MannWhitney recurrence) in `O(N1·N2·U)` time.
- *With ties*: Implements the linked-list-based algorithm from Cheung & Klotz (1997) via memoization (`makeUmemo`).
-**PMF & CDF evaluation**:
- `PMF(U)` returns the probability mass at U.
- `CDF(U)` computes cumulative probabilities using symmetry to minimize computation.
-**Support for tied ranks**:
- `T` encodes tie multiplicities per rank; if nil, no ties are assumed.
-**Optimized recurrence**:
- Exploits symmetry (`p_{n,m} = p_{m,n}`) and incremental DP to reduce memory/time.
-**Boundary handling**:
- `Bounds()` returns support `[0, N1·N2]`.
- `Step() = 0.5`, reflecting Us discrete unit in tied cases.
## Algorithm Notes
- `p(U)` uses a 2D DP table (rows = *n*, columns = U), computing only necessary states.
- `makeUmemo` builds a 3D memoization table (`k`, `n1`, `2U`) for tied distributions.
- Performance bottlenecks noted in comments (e.g., map overhead) suggest future optimization paths.
## Use Case
Enables exact *p*-value calculation for the **Mann-Whitney U test**, especially valuable when:
- Sample sizes are small-to-moderate (exact methods needed).
- Data contain ties.