Files
obitools4/autodoc/docmd/pkg/obistats/kmeans.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

2.0 KiB

obistats Package: K-Means Clustering Implementation

The obistats package provides a concurrent, type-generic implementation of the K-means clustering algorithm for numerical datasets.

Core Utilities

  • SquareDist / EuclideanDist: Compute squared and Euclidean distances between vectors (generic over float64 or int).
  • DefaultRG: Returns a seeded random number generator (*rand.Rand) for reproducibility control.

Data Structure

  • KmeansClustering: Encapsulates dataset (*obiutils.Matrix[float64]), cluster assignments, centers, and metadata (sizes, distances to nearest center).
  • Supports dynamic addition of clusters via AddACenter().

Initialization & Management

  • MakeKmeansClustering: Initializes the structure with data, number of clusters k, and RNG.
  • SetCenterTo, AddACenter: Assign or grow centers; uses k-means++-inspired weighted sampling for new centers.
  • ResetEmptyCenters: Reinitializes empty clusters using distance-weighted sampling.

Core Algorithm Steps

  • AssignToClass: Parallel assignment of points to nearest centers (uses goroutines + mutex).
  • ComputeCenters: Computes new cluster centroids as the closest original data point to the arithmetic mean (robust for non-Euclidean spaces).
  • Run: Executes iterative refinement until convergence (max_cycle iterations or inertia drop ≤ threshold).

Accessors & Diagnostics

  • K(), N(), Dimension(): Return number of clusters, dataset size, and feature dimension.
  • Inertia(): Sum of squared distances to assigned centers (convergence metric).
  • Centers, Classes, Sizes: Expose internal clustering state.

Design Highlights

  • Fully concurrent (goroutine-based) for performance.
  • Generic distance functions support both int and float64.
  • Explicit handling of edge cases (empty clusters, convergence).
  • Logging via logrus for debugging (obilog.Warnf).

Note: High-level wrapper functions (e.g., standalone Kmeans) are commented out but outline intended API usage.