Files
obitools4/autodoc/docmd/pkg_obiutils.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

62 lines
4.5 KiB
Markdown

# `obiutils` — Semantic Feature Overview
The **`obiutils`** package is a collection of low- and mid-level utilities for numerical computation, string manipulation, file I/O, concurrency control, data conversion, and format detection—specifically designed for bioinformatics pipelines in the OBITools 4 ecosystem. All public APIs are **type-safe**, **well-documented**, and optimized for performance or correctness depending on use case.
## Core Functional Categories
### 🔢 Numerical Utilities
- **`Abs[T constraints.Signed](x T) T`**: Generic absolute value for signed integers and floats (via `golang.org/x/exp/constraints`).
- **`Min/Max(...)`**: Unified functions accepting scalars, slices, or maps—uses reflection for heterogeneous inputs; returns errors on empty/unsupported types.
- **`MinMaxSlice[T constraints.Ordered]([]T) (min, max T)`**: Efficient min/max for ordered slices; panics on empty input.
- **`MinMultiset[T]`**: Lazy-delete min-priority multiset with O(log n) insertion, amortized O(1) minimum access.
### 📦 Data Structures
- **`Set[E comparable]`**: Generic set using `map[E]struct{}` for O(1) membership; supports union, intersection, add/contains/members.
- **`Vector[T]`, `Matrix[T][][]T`**: Row-major 2D structures with methods:
- `.Column(i)`, `.Rows(indices...)`, `.Dim()` (safe for nil/jagged matrices).
- `Make2DArray[T]`, `Make2DNumericArray[T](rows, cols int, zeroed bool)` for allocation.
### 🧠 Type Conversion & Validation
- **`InterfaceToString(i interface{}) string`**,
`CastableToInt(...)`,
`InterfaceToBool(...)` / `Int` / `Float64`: Safe conversions with typed errors (`NotAnInteger`, etc.).
- **`MapToMapInterface(...)`, `InterfaceToIntMap(...)` / `StringMap`: Converts generic maps to concrete types via reflection.
- **`InterfaceToStringSlice(...)`**: Normalizes `[]interface{}` or string slices to `[]string`.
### 📄 File & Stream I/O
- **`ReadLines(path string) ([]string, error)`**: Buffered line-by-line file reading.
- **`Wfile` abstraction** (`OpenWritingFile`, `CompressStream`) with transparent gzip (via `pgzip`), buffering, and append support.
- **`Ropen/Wopen(...)`**: Unified opener for files/stdin/HTTP/pipes, auto-detecting gzip/xz/zstd/bzip2 via magic bytes.
- **`DownloadFile(url, path string)`**: Simple HTTP download with progress bar (no retries/timeouts).
- **`TarFileReader(r io.Reader, path string)`**: Extracts a single file from TAR by exact name match.
### 🔤 String & ASCII Processing
- **`InPlaceToLower([]byte) []byte`**: Zero-copy uppercase→lowercase conversion for ASCII using bitwise OR (`| 32`).
- **`UnsafeStringFromBytes([]byte) string`, `UnsafeBytes(string) []byte`**: Zero-copy conversions (⚠️ unsafe; no bounds checks).
- **`AsciiSet[256]bool`**: Predefined sets (`Space`, `Digit`, `Alpha`) + operations (union, intersect) and helpers:
- `.FirstWord(...)`, `TrimLeft(s string)` (via method), `RightSplitInTwo(...)`.
### 📏 Memory & Path Utilities
- **`ParseMemSize(s string) (int, error)`**: Parses `"128K"`, `"5MB"` → bytes.
- **`FormatMemSize(n int) string`**: Formats byte counts as `"1.5K"`, `"2M"` (powers of 1024).
- **`RemoveAllExt(path string)`, `Basename(path string)`**: Strip *all* extensions from paths (e.g., `"file.tar.gz"``"file"`).
### 📡 Format Detection & MIME Handling
- **`HasBOM([]byte) bool, BOMType`**: Detects UTF-8/16/32 byte order marks.
- **`DropLastLine([]byte) []byte`**: Trims final newline-delimited line (for truncated files).
- **`RegisterOBIMimeType(...)`**: Extends MIME detection for bioformats (FASTA/FASTQ, CSV, ecoPCR2, GenBank) via regex/magic headers.
### 🔄 Concurrency & Synchronization
- **`AtomicCounter(start int)`**: Thread-safe counter with `Inc()`, `Dec()`, `Value()` (mutex-protected).
- **`RegisterAPipe/UnregisterPipe()`, `WaitForLastPipe()`**: Lightweight pipeline sync via `sync.WaitGroup` (logs active goroutines).
### 📊 Ranking & Ordering
- **`IntOrder(data []int) []int`, `ReverseIntOrder(...)`: Returns index permutation for ascending/descending sort (original slice unchanged).
- **`Order[T sort.Interface](data T) []int`: Generic stable index-based sorting.
### 🧪 Testing & Reliability
- All functions include **unit tests** (table-driven, `reflect.DeepEqual`, subtests).
- Error handling is explicit and typed; logging via Logrus for debugging.
- No external dependencies beyond `golang.org/x/exp/constraints` (for generics) and optional libraries (`progressbar`, `pgzip`).
- Designed for portability across Unix/Windows (uses standard library paths).