mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
Mann-Whitney U Distribution Implementation in obistats
The obistats package provides efficient computation of the Mann-Whitney U distribution, used in nonparametric hypothesis testing to compare two independent samples.
Core Types
UDist: Represents the discrete probability distribution of the U statistic for sample sizesN1,N2. It optionally handles ties via a tie-count vectorT.
Key Features
-
✅ Exact distribution computation, both with and without ties.
- No ties: Uses dynamic programming (Mann–Whitney recurrence) in
O(N1·N2·U)time. - With ties: Implements the linked-list-based algorithm from Cheung & Klotz (1997) via memoization (
makeUmemo).
- No ties: Uses dynamic programming (Mann–Whitney recurrence) in
-
✅ PMF & CDF evaluation:
PMF(U)returns the probability mass at U.CDF(U)computes cumulative probabilities using symmetry to minimize computation.
-
✅ Support for tied ranks:
Tencodes tie multiplicities per rank; if nil, no ties are assumed.
-
✅ Optimized recurrence:
- Exploits symmetry (
p_{n,m} = p_{m,n}) and incremental DP to reduce memory/time.
- Exploits symmetry (
-
✅ Boundary handling:
Bounds()returns support[0, N1·N2].Step() = 0.5, reflecting U’s discrete unit in tied cases.
Algorithm Notes
p(U)uses a 2D DP table (rows = n, columns = U), computing only necessary states.makeUmemobuilds a 3D memoization table (k,n1,2U) for tied distributions.- Performance bottlenecks noted in comments (e.g., map overhead) suggest future optimization paths.
Use Case
Enables exact p-value calculation for the Mann-Whitney U test, especially valuable when:
- Sample sizes are small-to-moderate (exact methods needed).
- Data contain ties.