Files
obitools4/autodoc/docmd/pkg_obiutils.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

4.5 KiB

obiutils — Semantic Feature Overview

The obiutils package is a collection of low- and mid-level utilities for numerical computation, string manipulation, file I/O, concurrency control, data conversion, and format detection—specifically designed for bioinformatics pipelines in the OBITools 4 ecosystem. All public APIs are type-safe, well-documented, and optimized for performance or correctness depending on use case.

Core Functional Categories

🔢 Numerical Utilities

  • Abs[T constraints.Signed](x T) T: Generic absolute value for signed integers and floats (via golang.org/x/exp/constraints).
  • Min/Max(...): Unified functions accepting scalars, slices, or maps—uses reflection for heterogeneous inputs; returns errors on empty/unsupported types.
  • MinMaxSlice[T constraints.Ordered]([]T) (min, max T): Efficient min/max for ordered slices; panics on empty input.
  • MinMultiset[T]: Lazy-delete min-priority multiset with O(log n) insertion, amortized O(1) minimum access.

📦 Data Structures

  • Set[E comparable]: Generic set using map[E]struct{} for O(1) membership; supports union, intersection, add/contains/members.
  • Vector[T], Matrix[T][][]T: Row-major 2D structures with methods:
    • .Column(i), .Rows(indices...), .Dim() (safe for nil/jagged matrices).
    • Make2DArray[T], Make2DNumericArray[T](rows, cols int, zeroed bool) for allocation.

🧠 Type Conversion & Validation

  • InterfaceToString(i interface{}) string,
    CastableToInt(...),
    InterfaceToBool(...) / Int / Float64: Safe conversions with typed errors (NotAnInteger, etc.).
  • **MapToMapInterface(...), InterfaceToIntMap(...) / StringMap: Converts generic maps to concrete types via reflection.
  • InterfaceToStringSlice(...): Normalizes []interface{} or string slices to []string.

📄 File & Stream I/O

  • ReadLines(path string) ([]string, error): Buffered line-by-line file reading.
  • Wfile abstraction (OpenWritingFile, CompressStream) with transparent gzip (via pgzip), buffering, and append support.
  • Ropen/Wopen(...): Unified opener for files/stdin/HTTP/pipes, auto-detecting gzip/xz/zstd/bzip2 via magic bytes.
  • DownloadFile(url, path string): Simple HTTP download with progress bar (no retries/timeouts).
  • TarFileReader(r io.Reader, path string): Extracts a single file from TAR by exact name match.

🔤 String & ASCII Processing

  • InPlaceToLower([]byte) []byte: Zero-copy uppercase→lowercase conversion for ASCII using bitwise OR (| 32).
  • UnsafeStringFromBytes([]byte) string, UnsafeBytes(string) []byte: Zero-copy conversions (⚠️ unsafe; no bounds checks).
  • AsciiSet[256]bool: Predefined sets (Space, Digit, Alpha) + operations (union, intersect) and helpers:
    • .FirstWord(...), TrimLeft(s string) (via method), RightSplitInTwo(...).

📏 Memory & Path Utilities

  • ParseMemSize(s string) (int, error): Parses "128K", "5MB" → bytes.
  • FormatMemSize(n int) string: Formats byte counts as "1.5K", "2M" (powers of 1024).
  • RemoveAllExt(path string), Basename(path string): Strip all extensions from paths (e.g., "file.tar.gz""file").

📡 Format Detection & MIME Handling

  • HasBOM([]byte) bool, BOMType: Detects UTF-8/16/32 byte order marks.
  • DropLastLine([]byte) []byte: Trims final newline-delimited line (for truncated files).
  • RegisterOBIMimeType(...): Extends MIME detection for bioformats (FASTA/FASTQ, CSV, ecoPCR2, GenBank) via regex/magic headers.

🔄 Concurrency & Synchronization

  • AtomicCounter(start int): Thread-safe counter with Inc(), Dec(), Value() (mutex-protected).
  • RegisterAPipe/UnregisterPipe(), WaitForLastPipe(): Lightweight pipeline sync via sync.WaitGroup (logs active goroutines).

📊 Ranking & Ordering

  • **IntOrder(data []int) []int, ReverseIntOrder(...): Returns index permutation for ascending/descending sort (original slice unchanged).
  • **Order[T sort.Interface](data T) []int: Generic stable index-based sorting.

🧪 Testing & Reliability

  • All functions include unit tests (table-driven, reflect.DeepEqual, subtests).
  • Error handling is explicit and typed; logging via Logrus for debugging.
  • No external dependencies beyond golang.org/x/exp/constraints (for generics) and optional libraries (progressbar, pgzip).
  • Designed for portability across Unix/Windows (uses standard library paths).