Files
obitools4/autodoc/docmd/pkg/obistats/udist.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.7 KiB
Raw Blame History

Mann-Whitney U Distribution Implementation in obistats

The obistats package provides efficient computation of the Mann-Whitney U distribution, used in nonparametric hypothesis testing to compare two independent samples.

Core Types

  • UDist: Represents the discrete probability distribution of the U statistic for sample sizes N1, N2. It optionally handles ties via a tie-count vector T.

Key Features

  • Exact distribution computation, both with and without ties.

    • No ties: Uses dynamic programming (MannWhitney recurrence) in O(N1·N2·U) time.
    • With ties: Implements the linked-list-based algorithm from Cheung & Klotz (1997) via memoization (makeUmemo).
  • PMF & CDF evaluation:

    • PMF(U) returns the probability mass at U.
    • CDF(U) computes cumulative probabilities using symmetry to minimize computation.
  • Support for tied ranks:

    • T encodes tie multiplicities per rank; if nil, no ties are assumed.
  • Optimized recurrence:

    • Exploits symmetry (p_{n,m} = p_{m,n}) and incremental DP to reduce memory/time.
  • Boundary handling:

    • Bounds() returns support [0, N1·N2].
    • Step() = 0.5, reflecting Us discrete unit in tied cases.

Algorithm Notes

  • p(U) uses a 2D DP table (rows = n, columns = U), computing only necessary states.
  • makeUmemo builds a 3D memoization table (k, n1, 2U) for tied distributions.
  • Performance bottlenecks noted in comments (e.g., map overhead) suggest future optimization paths.

Use Case

Enables exact p-value calculation for the Mann-Whitney U test, especially valuable when:

  • Sample sizes are small-to-moderate (exact methods needed).
  • Data contain ties.