Files
obitools4/autodoc/docmd/pkg/obistats/udist.md
T

40 lines
1.7 KiB
Markdown
Raw Normal View History

2026-04-07 08:36:50 +02:00
# Mann-Whitney U Distribution Implementation in `obistats`
The `obistats` package provides efficient computation of the **Mann-Whitney U distribution**, used in nonparametric hypothesis testing to compare two independent samples.
## Core Types
- **`UDist`**: Represents the discrete probability distribution of the U statistic for sample sizes `N1`, `N2`. It optionally handles **ties** via a tie-count vector `T`.
## Key Features
-**Exact distribution computation**, both with and without ties.
- *No ties*: Uses dynamic programming (MannWhitney recurrence) in `O(N1·N2·U)` time.
- *With ties*: Implements the linked-list-based algorithm from Cheung & Klotz (1997) via memoization (`makeUmemo`).
-**PMF & CDF evaluation**:
- `PMF(U)` returns the probability mass at U.
- `CDF(U)` computes cumulative probabilities using symmetry to minimize computation.
-**Support for tied ranks**:
- `T` encodes tie multiplicities per rank; if nil, no ties are assumed.
-**Optimized recurrence**:
- Exploits symmetry (`p_{n,m} = p_{m,n}`) and incremental DP to reduce memory/time.
-**Boundary handling**:
- `Bounds()` returns support `[0, N1·N2]`.
- `Step() = 0.5`, reflecting Us discrete unit in tied cases.
## Algorithm Notes
- `p(U)` uses a 2D DP table (rows = *n*, columns = U), computing only necessary states.
- `makeUmemo` builds a 3D memoization table (`k`, `n1`, `2U`) for tied distributions.
- Performance bottlenecks noted in comments (e.g., map overhead) suggest future optimization paths.
## Use Case
Enables exact *p*-value calculation for the **Mann-Whitney U test**, especially valuable when:
- Sample sizes are small-to-moderate (exact methods needed).
- Data contain ties.