⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
This commit is contained in:
Eric Coissac
2026-04-07 08:36:50 +02:00
parent 670edc1958
commit 8c7017a99d
392 changed files with 18875 additions and 141 deletions
+41
View File
@@ -0,0 +1,41 @@
# `obiutils.Abs` — Generic Absolute Value Function
This package provides a **type-generic utility function** for computing the absolute value of signed numeric types in Go.
## Function Signature
```go
func Abs[T constraints.Signed](x T) T
```
- **Generic constraint**: `T` must satisfy `constraints.Signed`, i.e., any signed integer type (`int`, `int8``int64`) or floating-point type (via future Go versions supporting floats in `constraints.Signed`).
- **Input**: A value of type `T`.
- **Output**: The absolute (non-negative) counterpart, same type as input.
## Semantics
- Returns `x` if `x ≥ 0`.
- Otherwise, returns `-x`, effectively flipping the sign.
- Handles all signed numeric types uniformly — no need for type-specific overloads.
## Example Usage
```go
absInt := obiutils.Abs(-5) // → 5 (type: int)
absFloat64 := obiutils.Abs(-3.14) // → 3.14 (type: float64)
```
## Design Rationale
- Leverages Go generics for **reusability** and type safety.
- Avoids duplication across `AbsInt`, `AbsFloat64`, etc.
- Follows Gos standard library conventions (e.g., similar to `math.Abs` but *generic* and not limited to floats).
## Limitations
- Does **not** support unsigned types (by design: `constraints.Signed` excludes them).
- For floating-point special cases (`NaN`, `-0.0`) behavior matches native negation semantics.
## Dependencies
- Requires `golang.org/x/exp/constraints` for the generic type constraint.
+23
View File
@@ -0,0 +1,23 @@
## Semantic Description of `obiutils.Abs` Functionality
The provided Go test suite (`TestAbs`) validates the semantic behavior of a utility function `Abs` from the package [`obiutils`](https://git.metabarcoding.org/obitools/obitools4), part of the OBITools 4 ecosystem — a toolkit for DNA metabarcoding data analysis.
- **Function Purpose**:
`obiutils.Abs` computes the *absolute value* of an integer, returning its non-negative magnitude regardless of sign.
- **Test Coverage**:
The test verifies correctness across two categories:
- *Non-negative inputs* (`0`, `1`, `5`, `10`) → outputs unchanged.
- *Negative inputs* (`-1`, `-5`, `-10`) → outputs their positive counterparts.
- **Semantic Semantics**:
The function adheres to the mathematical definition: `Abs(x) = x` if `x ≥ 0`, else `-x`.
It ensures robustness for edge cases (e.g., zero) and typical integer ranges used in bioinformatic pipelines.
- **Integration Context**:
As part of `obitools4`, such low-level utilities likely support numerical operations in sequence alignment scoring, quality filtering, or coordinate transformations — where signed differences must be normalized.
- **Test Quality**:
Uses table-driven testing (Go idiom), promoting maintainability and clarity. No external dependencies are required — confirming the function is pure, deterministic, and self-contained.
In summary: `Abs` provides a foundational arithmetic primitive with guaranteed correctness for integer inputs, enabling reliable downstream computation in OBITools data processing workflows.
+23
View File
@@ -0,0 +1,23 @@
# `obiutils` Package: Semantic Overview
This Go package (`obiutils`) provides generic utilities for numerical and matrix operations, leveraging generics (Go 1.18+). It defines foundational types and helper functions for working with multidimensional data structures.
- **Type Interfaces**
- `Integer`: Constraint covering signed integer types (`int`, `int8``int64`).
- `Float`: Constraint for floating-point types (`float32`, `float64`).
- `Numeric`: Union of both, enabling generic numeric functions.
- **Data Structures**
- `Vector[T]`: A slice-based vector (`[]T`).
- `Matrix[T]`: A row-major representation of a 2D matrix (`[][]T`), backed by contiguous memory for performance.
- **Core Functions**
- `Make2DArray[T]`: Allocates a zero-initialized, contiguous-row-major matrix of arbitrary type `T`.
- `Make2DNumericArray[T]`: Same as above, but restricted to numeric types; optionally pre-fills with zeros if `zeroed=true`.
- **Matrix Methods**
- `.Column(i int)`: Extracts column `i` as a slice (not row-wise access).
- `.Rows(i ...int)`: Returns a new matrix containing only the specified row indices.
- `.Dim() (int, int)`: Returns `(rows, cols)` safely handling `nil` or empty matrices.
The design prioritizes memory efficiency (via contiguous backing arrays), type safety through generics, and ergonomic access patterns for linear algebra-like workflows.
+15
View File
@@ -0,0 +1,15 @@
# Semantic Description of `obiutils` Matrix Functionality
The `package obiutils` provides a generic, type-safe matrix abstraction in Go with core utility methods for construction and querying.
- **`Make2DArray[T]()`**: A generic constructor that initializes a 2D slice (matrix) of type `Matrix[T]`, with specified numbers of rows and columns. All elements are zero-initialized (e.g., `0` for integers, empty string for strings, default struct values).
- **`.Column(colIndex int)`**: Extracts and returns a single column (as `[]T`) from the matrix at the given 0-based index, preserving element order across rows.
- **`.Rows(indices ...int)`**: Returns a new matrix composed of only the specified row indices (0-based), supporting single-row, multi-row, or empty selections.
- **`.Dim() (rows, cols int)`**: Returns the dimensions of the matrix as `(number_of_rows, number_of_columns)`. Handles edge cases: `nil`, empty (`{}`), and jagged or zero-column matrices safely (e.g., `{ { } }` yields `(1, 0)`).
All functionality is implemented as methods on the `Matrix[T]` type (implicitly defined via slices of slices), leveraging Go generics for compile-time safety and runtime efficiency.
The package includes comprehensive unit tests validating correctness across types (`int`, `string`, custom structs) and boundary conditions.
+27
View File
@@ -0,0 +1,27 @@
# `InPlaceToLower` Function — Semantic Description
The `obiutils.InPlaceToLower` function provides a high-performance, memory-efficient utility for converting ASCII uppercase letters to lowercase **in place**, without allocating new data structures.
## Core Functionality
- Takes a `[]byte` slice (`data`) as input.
- Iterates over each byte, identifying uppercase ASCII characters (i.e., `'A'``'Z'`, values `65``90`).
- Converts each uppercase byte to its lowercase counterpart using a bitwise OR with `32`, leveraging the ASCII encoding property:
`lowercase = uppercase | 0b0010_0000` (since `'a' - 'A' = 32`).
- Returns the **same** `[]byte` slice, now mutated in-place.
## Key Characteristics
-**Zero-copy**: No new memory is allocated—ideal for performance-critical or low-level contexts (e.g., streaming, embedded systems).
-**ASCII-safe**: Only modifies bytes in the `'A'``'Z'` range; other bytes (e.g., digits, symbols, non-ASCII) remain unchanged.
-**Idiomatic Go**: Uses idioms like `range` with index/value and bitwise optimization.
- ⚠️ **Destructive**: Input data is permanently modified—callers must clone if preservation is needed.
## Use Cases
- Preprocessing raw HTTP headers or payloads.
- Optimizing case-insensitive comparisons in high-throughput systems.
- Embedded tools where GC pressure or heap allocation must be minimized.
## Example
```go
buf := []byte("HTTP/1.1 200 OK")
InPlaceToLower(buf) // buf is now []byte("http/1.1 200 ok")
```
+20
View File
@@ -0,0 +1,20 @@
# `obiutils` Package Functional Overview
The `obiutils` package provides two core utility functions for low-level and numerical operations in Go:
- **`InPlaceToLower([]byte) []byte`**
Converts all ASCII uppercase letters in a byte slice to lowercase *in-place*, returning the modified slice.
- Non-alphabetic bytes remain unchanged.
- Memory-efficient: modifies input directly (no allocation of new slice).
- **`Make2DNumericArray[T any](rows, cols int, zeroed bool) Matrix[T]`**
Generates a generic 2D numeric array (`Matrix`) of type `T`, supporting any comparable/numeric Go type.
- Parameters: number of rows, columns, and whether to initialize with zero values (`true`) or default `T` (e.g., 0 for int).
- Uses Go generics (`[T any]`) for type safety and flexibility.
Both functions are thoroughly unit-tested in `*_test.go`, covering edge cases:
- Empty/nil inputs (`InPlaceToLower`)
- Various dimensions and zero-initialization modes (`Make2DNumericArray`)
Tests use `reflect.DeepEqual` for structural comparison and subtests via `t.Run`.
The package assumes a custom type alias: `type Matrix[T any] [][]T`.
@@ -0,0 +1,15 @@
# `obiutils` Package — Semantic Feature Summary
This Go package provides a set of utility functions for **type conversion**, **casting validation**, and **map/slice transformation** in a flexible, error-tolerant manner.
- `InterfaceToString(i interface{})`: Converts any value to a string, preferring the `Stringer` interface if implemented.
- `CastableToInt(i interface{})`: Checks whether a value is *numerically castable* to an `int` (supports all numeric types).
- `InterfaceToBool(i interface{})`: Safely converts various input types (`bool`, numeric, string like `"true"`, `"1"`, etc.) to `bool`; returns a custom error for unsupported types.
- `InterfaceToInt(i interface{})`: Converts numeric or string representations to an `int`, with precise error handling.
- `InterfaceToFloat64(i interface{})`: Converts numeric or string types to `float64`, using standard parsing.
- `MapToMapInterface(m interface{})`: Converts specialized map types (e.g., read-only or concurrency-safe maps) to `map[string]interface{}` via reflection.
- `InterfaceToIntMap(i interface{})`: Converts compatible map types (`map[string]int`, `hasMap` interfaces, or generic maps) to a concrete `map[string]int`.
- `InterfaceToStringMap(i interface{})`: Converts map values to strings, yielding a clean `map[string]string`.
- `InterfaceToStringSlice(i interface{})`: Converts slices of interfaces or strings into a pure `[]string`.
All functions include **explicit error handling** via custom types (e.g., `NotAnInteger`, `NotAMapInt`) and use logging via Logrus for debugging. The package prioritizes **type safety**, **robustness**, and interoperability across Go types.
+25
View File
@@ -0,0 +1,25 @@
# `obiutils.Counter`: Thread-Safe Atomic Counter
A minimal, thread-safe counter implementation in Go.
## Features
- **Atomic increment/decrement**: `Inc()` and `Dec()` modify the internal counter atomically using a mutex.
- **Current value retrieval**: `Value()` safely returns the current count without modifying it.
- **Initial value support**: Constructor accepts an optional initial integer (defaults to `0`).
- **Closure-based API**: Encapsulates state and synchronization behind clean, functional methods.
- **No external dependencies**: Uses only the standard library (`sync`).
## Usage Example
```go
counter := obiutils.NewCounter(10) // start at 10
fmt.Println(counter.Inc()) // → 11
fmt.Println(counter.Dec()) // → 10
fmt.Println(counter.Value()) // → 10 (unchanged)
```
## Thread Safety
All operations are protected by a `sync.Mutex`, ensuring correctness in concurrent environments.
## Design Notes
- Immutable interface: methods return updated values, not pointers.
- No reset method provided—intentionally minimal and focused on core counting semantics.
+37
View File
@@ -0,0 +1,37 @@
## `obiutils.DownloadFile` — Semantic Feature Overview
- **Core Functionality**: Downloads a file from a given URL to a specified local path.
- **HTTP Client Behavior**:
- Uses `http.Get()` for simple, synchronous GET requests.
- Validates the HTTP status code; aborts on non-200 responses with a descriptive error.
- **Resource Management**:
- Ensures proper cleanup via `defer resp.Body.Close()` and `defer out.Close()`.
- **Progress Tracking**:
- Integrates [`progressbar`](https://github.com/schollz/progressbar) to display real-time download progress.
- Uses `DefaultBytes()` for a human-readable, byte-based indicator (e.g., "downloading 12.3 MB / 45.6 MB").
- **Efficient I/O**:
- Leverages `io.Copy()` with an `io.MultiWriter` to stream data directly from the HTTP response body into both:
- The target file (`out`)
- The progress bar (to update on each chunk written)
- **Error Handling**:
- Returns early with wrapped errors for network failures, HTTP non-success codes, or file I/O issues.
- **Simplicity & Usability**:
- Minimal API surface: only two arguments (`url`, `filepath`).
- No external configuration needed — ideal for CLI tools or batch scripts.
- **Assumptions**:
- No authentication, redirects, proxies, timeouts, or retries are implemented.
- Designed for straightforward downloads where robustness is secondary to simplicity.
- **Typical Use Cases**:
- CLI utilities, build scripts, CI/CD pipelines.
- Prototyping or internal tools where advanced download features are unnecessary.
- **Limitations**:
- Not suitable for large-scale or production-grade downloads without enhancements (e.g., retries, concurrency control).
+16
View File
@@ -0,0 +1,16 @@
# `obiutils` Package Overview
This Go package provides utility functions for common data conversion, serialization, and reflection tasks.
- **Custom Error Types**: Defines typed errors (`NotAnInteger`, `NotAFloat64`, etc.) for precise type validation failures.
- **Interface-to-Type Casting**: Offers robust conversion functions:
- `InterfaceToFloat64Map`, `InterfaceToIntSlice`, etc., handling nested interfaces and type coercion (e.g. `int``float64`, slices of `interface{}`).
- **File I/O**: `ReadLines` reads a file line-by-line into a string slice, handling buffered reading efficiently.
- **Concurrency**: `AtomicCounter` returns an incrementing integer generator—thread-safe via mutex, optionally starting from a given value.
- **JSON Serialization**: `JsonMarshal` and `JsonMarshalByteBuffer` provide UTF8preserving JSON encoding (avoids Gos default HTML escaping).
- **Reflection Helpers**:
- `IsAMap`, `IsASlice`, `IsAnArray` detect container types.
- `HasLength`, `Len`, and `IsAContainer` abstract length operations across maps, slices, arrays, or custom types with a `.Len()` method.
- **Deep Copying**: `MustFillMap` performs deep copying of nested structures using `go-deepcopy`.
All functions prioritize safety, type correctness, and usability in data-heavy or concurrent applications.
+37
View File
@@ -0,0 +1,37 @@
# `obiutils` Package: File and Stream Writing Utilities
The `obiutils` package provides a unified abstraction for writing data to files or streams, with optional gzip compression and buffered I/O.
## Core Type: `Wfile`
- Encapsulates a write-ready output stream (`io.WriteCloser`).
- Supports both **compressed** (gzip) and uncompressed modes.
- Uses `bufio.Writer` for efficient buffered writes.
## Key Functions
### `OpenWritingFile(name string, compressed bool, append bool) (*Wfile, error)`
- Opens a file for writing.
- `compressed`: enables gzip compression via `pgzip`.
- `append`: if true, writes at end of file (`os.O_APPEND`).
- Returns a ready-to-use `*Wfile`.
### `CompressStream(out io.WriteCloser, compressed bool, close bool) (*Wfile, error)`
- Wraps an arbitrary `io.WriteCloser` (e.g., HTTP response, pipe) in buffered/compressed I/O.
- `close`: if true, the underlying writer is closed on `.Close()`.
## Methods
- **`Write(p []byte)` / `WriteString(s string)`**:
Buffered writes to the underlying stream (transparently compressed if enabled).
- **`Close()`**:
- Flushes the buffer.
- Closes gzip writer (if compressed).
- Closes underlying file/stream *only if* `close == true`.
## Design Highlights
- **Transparent compression**: Uses high-performance `pgzip` for parallel gzip.
- **Resource control**: Explicit flag (`close`) prevents premature closure of shared writers (e.g., in pipelines).
- **Efficiency**: Double buffering via `bufio.Writer` + gzip stream.
+15
View File
@@ -0,0 +1,15 @@
# `obiutils` Package: Memory Size Parsing and Formatting
This Go package provides two complementary utility functions for handling human-readable memory sizes:
- **`ParseMemSize(s string) (int, error)`**
Parses a memory size string into an integer number of bytes. Supports case-insensitive units: `B`, `K`/`KB`, `M`/`MB`, `G`/`GB`, and `T`/`TB`.
Examples: `"128K"``131072`, `"512MB"``536870912`.
Returns an error for invalid input (e.g., empty string, non-numeric prefix, or unknown unit).
- **`FormatMemSize(n int) string`**
Converts a byte count into the most appropriate human-readable format using powers of 1024.
Uses suffixes `T`, `G`, `M`, or `K`; falls back to bytes (`B`) if < 1 KiB.
Integers are displayed without decimals (e.g., `2048``"2K"`), while fractional values use one decimal (e.g., `1536``"1.5K"`).
Both functions ensure semantic clarity and consistency for memory-related I/O, logging, or configuration parsing.
+32
View File
@@ -0,0 +1,32 @@
# OBIMimeUtils: Semantic Description of Features
The `obiutils` Go package provides utilities for detecting and handling biological data file formats, primarily via MIME type inference.
## Core Functionalities
- **BOM Detection (`HasBOM`)**
Identifies Byte Order Marks (BOMs) for UTF-8, UTF-16 BE/LE, and UTF-32 BE/LE encodings. Logs detected types for transparency.
- **Last-Line Trimming (`DropLastLine`)**
Removes the final newline-delimited line from a byte slice — useful for sanitizing incomplete or truncated files.
- **MIME Type Registration (`RegisterOBIMimeType`)**
Extends generic MIME types (e.g., `text/plain`, `application/octet-stream`) with format-specific detectors for:
- **CSV**: Validates structured comma-separated data (≥2 fields, ≥2 lines).
- **FASTA/FASTQ**: Regex-based detection of sequence headers (`>` or `@`).
- **ecoPCR2**: Detects files starting with the magic header `#@ecopcr-v2`.
- **GenBank/EMBL**: Checks for standard sequence record prefixes (`LOCUS`, `ID`).
- **Format-Specific Extensions**
Registers custom MIME subtypes (e.g., `text/fasta`, `.fasta`) and associates them with appropriate file extensions.
- **Idempotent Registration**
Ensures MIME detectors are registered only once using a guard flag.
## Design Goals
- Robust, lightweight format inference without full parsing.
- Extensible architecture for future bioinformatics formats.
- Logging-friendly (via `logrus`) to aid debugging and observability.
This package enables accurate, context-aware MIME detection in pipelines handling heterogeneous biological data.
+36
View File
@@ -0,0 +1,36 @@
# `obiutils` Package: Semantic Overview
The `obiutils` package provides generic and reflection-based utilities for computing minima and maxima across multiple data structures in Go.
## Core Features
- **Generic `MinMax` / `Min/MaxSlice`**:
- `MinMax[T constraints.Ordered]`: Returns the ordered pair `(min, max)` of two values.
- `MinMaxSlice[T constraints.Ordered]`: Finds min and max in a slice of ordered types (panics on empty input).
- **Map-based Min/Max**:
- `MinMap` / `MaxMap`: Returns the key and value of the smallest/largest *value* in a map (errors on empty maps).
- **Unified `Min` / `Max` Functions**:
- Accepts *any* Go value: single scalar, slice/array/map.
- Uses reflection to dispatch logic based on runtime type (`reflect.Kind`).
- Supports ordered kinds: integers, floats, strings (signed/unsigned ints via `constraints.Ordered` subset).
- Returns an error for unsupported or empty containers.
- **Helper Reflection Functions**:
- `minFromIterable` / `maxFromIterable`: Scan slices/arrays.
- `minFromMap` / `maxFromMap`: Iterate over map values (ignores keys in comparisons).
- `isOrderedKind`, `less`, `greater`: Internal comparison logic for reflection-based ordering.
## Design Highlights
- **Type Safety & Generics**: Leverages Go 1.18+ generics for compile-time type constraints where possible.
- **Flexibility**: The `Min(data interface{})` / `Max(...)` functions allow a *single API* for heterogeneous inputs.
- **Error Handling**: Explicit errors (e.g., `"empty slice"`, `"unsupported type"`), no panics for user-facing APIs.
- **Fallback Support**: Checks if the input has a `Min()`/`Max()` method (via reflection) before falling back to generic logic.
## Limitations
- Reflection-based paths are slower than direct generics.
- No support for custom types without ordering defined (e.g., structs unless they satisfy `constraints.Ordered`).
- Maps compare only *values*; keys are irrelevant for min/max selection.
+35
View File
@@ -0,0 +1,35 @@
# `MinMultiset[T]` — A Lazy-Delete Min-Multiset Implementation
A generic, type-safe multiset data structure in Go that maintains elements with multiplicity and provides efficient access to the current minimum. Built on top of a min-heap (`container/heap`) with **lazy deletion** to support efficient removals without rebuilding the heap.
## Core Features
-**Generic over comparable types** (`T`) with custom ordering via `less` comparator
-**Multiset semantics**: supports multiple occurrences of the same value
-**O(log n) insertion** (`Add`) and **amortized O(1)** minimum access
-**Lazy deletion**: `RemoveOne` marks items for removal; physical cleanup occurs on next `Min()` call
-**Size tracking**: logical size (`Len()`) excludes deleted items, even if still in heap
-**Memory-efficient cleanup**: `shrink()` and `cleanTop()` prevent tombstone accumulation
## API Summary
| Method | Description |
|--------|-------------|
| `NewMinMultiset(less)` | Constructor; initializes heap, maps (`count`, `pending`), and sets ordering |
| `Add(v)` | Inserts one occurrence of `v`; increments logical size & count map |
| `RemoveOne(v)` | Removes *one* occurrence if present; returns success flag (`false` otherwise) |
| `Min()` | Returns current minimum (or zero value + `ok=false`) after cleaning stale top entries |
| `Len()` | Returns logical size (excludes pending deletions) |
## Internal Mechanism
- **`count[T]int`**: tracks how many times each value is *logically* present
- **`pending[T]int`**: tracks how many times each value is *marked for removal*
- **Heap invariant maintained only up to logical size** — stale entries are pruned lazily during `Min()` or after deletions
- **No manual cleanup needed** — the structure self-balances incrementally
## Use Cases
Priority queues with deletable arbitrary elements (e.g., Dijkstras algorithm where distances are updated), sliding-window minima, event scheduling with cancellation.
> ⚠️ Note: `less` must define a *strict total order* (transitive, antisymmetric, connected) for correctness.
+16
View File
@@ -0,0 +1,16 @@
## `obiutils` Package: File Path Utility Functions
This Go package provides two utility functions for manipulating file paths by removing extensions:
### `RemoveAllExt(p string) string`
- **Purpose**: Strips *all* file extensions from a given path (e.g., `/dir/file.tar.gz``/dir/file`).
- **Mechanism**: Iteratively uses `path.Ext()` and `strings.TrimSuffix` to remove extensions from the *full path*, including directory components if they contain dots (though rare).
- **Use Case**: Useful when you need to sanitize a full path for naming or comparison, regardless of extension stacking.
### `Basename(path string) string`
- **Purpose**: Extracts the base filename *without any extensions* (e.g., `/dir/file.tar.gz``file`).
- **Mechanism**: Uses `filepath.Base()` to get the filename, then iteratively strips extensions via `strings.TrimSuffix`.
- **Key Difference**: Operates *only on the filename*, not directory parts — safer and more conventional for typical file handling.
Both functions handle multi-extension files (e.g., `.tar.gz`, `.backup.zip`) robustly. They avoid reliance on `strings.LastIndex` or regex, favoring clarity and standard library usage (`path`, `filepath`).
Designed for portability across Unix-like systems (uses forward slashes), though Windows paths are supported via `filepath.Base`.
+19
View File
@@ -0,0 +1,19 @@
# `obiutils` Package: Functional Overview
The `obiutils` package provides utility functions for common file path manipulations in Go. Its current public API includes:
- **`RemoveAllExt(path string) string`**
Strips *all* file extensions from a given path, returning the base name without any trailing suffixes (e.g., `.txt`, `.tar.gz`).
- Handles paths with no extensions unchanged.
- Correctly processes single- and multi-part (e.g., `.tar.gz`) extensions.
- Designed for robustness across Unix-like and cross-platform path conventions.
The package currently includes a single unit test suite:
- **`TestRemoveAllExt(t *testing.T)`**
Validates the correctness of `RemoveAllExt` using three test cases:
`"path/to/file"` → unchanged (`"path/to/file"`)
`"path/to/file.txt"` → stripped to `"/file"` (→ `"path/to/file"`)
`"path/to/file.tar.gz"` → fully stripped to `"/file"` (→ `"path/to/file"`)
This ensures reliable behavior for downstream code relying on extension-agnostic path handling—e.g., in build systems, data pipelines, or file-processing tools.
+36
View File
@@ -0,0 +1,36 @@
# `obiutils` Package: Pipe Synchronization Utilities
This Go package provides lightweight synchronization primitives for managing concurrent pipeline execution, particularly useful in CLI or batch-processing applications.
## Core Components
- **`globalLocker`:** A `sync.WaitGroup` tracking active pipeline goroutines.
- **`globalLockerCounter`:** An integer counter for logging/debugging the number of active pipes.
## Public Functions
### `RegisterAPipe()`
- Increments both the WaitGroup and counter.
- Logs current count at debug level (`log.Debugln`).
- Typically called when starting a new pipeline stage or goroutine.
### `UnregisterPipe()`
- Decrements the WaitGroup and counter.
- Logs updated count at debug level.
- Should be invoked when a pipeline finishes (e.g., `defer UnregisterPipe()`).
### `WaitForLastPipe()`
- Blocks until all registered pipes complete (`globalLocker.Wait()`).
- Intended to be called at the end of `main()`, ensuring graceful shutdown.
## Semantic Use Case
Enables safe, concurrent execution of multiple independent pipelines (e.g., data processing stages), ensuring the program waits for all to finish before exiting — without explicit channel or mutex management.
## Design Notes
- **Thread-safe** via `sync.WaitGroup`.
- **Minimalist**: No error handling; assumes correct usage.
- **Logging-focused** for observability in development/debug builds.
> ⚠️ Not production-ready without additional safeguards (e.g., panic recovery, timeout support).
+35
View File
@@ -0,0 +1,35 @@
# `obiutils` — Semantic Description of Core Functionality
This Go package provides generic and type-specific utilities for **ranking** and **ordering** data without modifying the original slice. It leverages Gos `sort` package to compute index permutations that reflect sorted order.
## Key Components
- **IntOrder(data []int) []int**
Returns indices that would sort a slice of integers in *ascending* order. The original data remains unchanged.
- **ReverseIntOrder(data []int) []int**
Same as `IntOrder`, but returns indices for *descending* order.
- **Order[T sort.Interface](data T) []int**
Generic version accepting any type implementing `sort.Interface`. Returns stable sorted indices.
## Internal Design
- **intRanker** and **Ranker[T]**: Helper types wrapping data + index list (`r`).
They implement `sort.Interface` *indirectly*—sorting indices instead of mutating data.
- **Index-based sorting**:
By permuting a list of indices (`r = [0,1,...]`), the original data is never copied or altered—ideal for large datasets or immutable inputs.
- **Stability**: `Order` uses `sort.Stable`, preserving relative order of equal elements.
## Use Cases
- Sorting metadata (e.g., sorting labels by associated scores).
- Preparing orderings for downstream operations (plots, ranking metrics).
- Efficiently tracking original positions after sort.
## Constraints
- Requires `sort.Interface` for generic version (e.g., custom structs with methods).
- Returns empty slice (`nil`) on zero-length input.
+34
View File
@@ -0,0 +1,34 @@
# `obiutils.Set` — Generic Set Implementation in Go
This package provides a generic, type-safe set data structure for Go (1.20+), leveraging generics (`comparable` constraint). It supports common set operations with intuitive APIs.
## Core Features
- **Generic Type Support**: `Set[E]` works for any comparable type (e.g., `int`, `string`, custom structs with equality).
- **Memory-Efficient Representation**: Implemented as a map from element to empty struct (`struct{}{}`), minimizing memory overhead.
- **Immutability by Default**: Methods like `Union` and `Intersection` return *new* sets; in-place mutation is explicit (e.g., via `Add()`).
## Key Functions & Methods
| Function/Method | Description |
|-----------------|-------------|
| `MakeSet[E](vals ...E)` | Creates and returns a new set populated with given values. |
| `NewSet[E](vals ...E)` | Same as `MakeSet`, but returns a pointer (`*Set[E]`). |
| `(s Set[E]) Add(vals ...E)` | Inserts one or more elements into the set (in-place). |
| `(s Set[E]) Contains(v E) bool` | Checks membership of an element. O(1). |
| `(s Set[E]) Members() []E` | Returns all elements as a slice (order not guaranteed). |
| `(s Set[E]) String() string` | Human-readable representation via `fmt.Sprintf`. |
| `(s Set[E]) Union(s2 Set[E])` | Returns a new set containing elements from both sets. |
| `(s Set[E]) Intersection(s2 Set[E])` | Returns a new set with elements common to both sets. |
## Example Usage
```go
s1 := obiutils.MakeSet(1, 2, 3)
s2 := obiutils.NewSet("a", "b")
fmt.Println(s1.Contains(2)) // true
union := s1.Union(MakeSet(3, 4))
fmt.Println(union.Members()) // e.g., [1 2 3 4]
```
> Designed for clarity, performance, and idiomatic Go usage.
+36
View File
@@ -0,0 +1,36 @@
# `obiutils` Package: Set Implementation in Go
The `obiutils` package provides a generic, type-safe set data structure for Go (v1.18+), along with comprehensive unit tests.
## Core Features
- **Generic Set Type**: Implemented as `Set[T]`, using a map for O(1) membership checks.
- **Constructors**:
- `MakeSet[T](...T)` returns a new set populated with given elements.
- `NewSet[T]()` allocates an empty pointer to a set; useful for dynamic initialization.
- **Methods**:
- `Add(...T)` inserts one or more elements (idempotent).
- `Contains(T) bool` checks membership.
- `Members() []T` returns a sorted slice of elements (deterministic iteration).
- `String() string` provides human-readable representation (`[a b c]` format).
- **Set Operations**:
- `Union(other Set[T]) Set[T]`: returns a new set with elements in either operand.
- `Intersection(other Set[T]) Set[T]`: returns a new set with elements common to both.
## Test Coverage
Unit tests validate:
- Set creation (empty, single/multiple values).
- Element addition and membership.
- String formatting for various sizes.
- Correctness of union/intersection across edge cases (empty sets, disjoint/common elements).
All tests use `reflect.DeepEqual` for precise structural comparison and sort outputs where order is non-deterministic.
## Design Notes
- Immutable operations: methods return *new* sets rather than mutating in-place.
- No duplicate support (standard set semantics).
- Efficient storage via Go maps; no external dependencies.
> **Note**: This is a minimal, idiomatic set implementation—ideal for utility or testing contexts.
+22
View File
@@ -0,0 +1,22 @@
# `obiutils` Package Overview
The `obiutils` package provides generic, reusable utility functions for common slice operations in Go.
- **`Contains[T comparable](arr []T, x T) bool`**
Checks whether a given element exists in the slice. Uses generic type `T`, requiring only that it supports equality comparison.
- **`LookFor[T comparable](arr []T, x T) int`**
Returns the index of the *first* occurrence of `x`, or `-1` if not found. Also generic over comparable types.
- **`RemoveIndex[T comparable](s []T, index int) []T`**
Removes the element at `index`, returning a new slice. Works in O(1) time (amortized), using `append` to rebuild the slice.
- **`Reverse[S ~[]E, E any](s S, inplace bool) S`**
Reverses the slice elements. If `inplace = true`, modifies the original; otherwise, copies first and returns a reversed copy. Uses type constraint `~[]E` for flexibility across slice aliases.
All functions are designed to be:
- Type-safe via Go generics (no reflection),
- Efficient and idiomatic,
- Well-documented with clear parameter/return semantics.
Ideal for use in data processing, validation logic, or general-purpose slice manipulation.
+33
View File
@@ -0,0 +1,33 @@
# `obiutils` Package Overview
The `obiutils` package provides low-level, high-performance utilities for ASCII string and set manipulation in Go.
### Core Components
- **`AsciiSet[256]bool`**: A compact boolean lookup table for ASCII characters (0127), optimized for membership tests.
- **Predefined Sets**:
- `AsciiSpaceSet`: Whitespace characters (`\t\n\v\f\r `)
- `AsciiDigitSet`, `AsciiUpperSet`, `AsciiLowerSet`
- Derived sets: `Alpha` (letters), `Alnum` (alphanumeric)
### Key Functions
- **Set Operations**:
- `AsciiSetFromString(s string)`: Build a set from characters in a literal.
- `.Contains(c byte)` / `.Union()` / `.Intersect()`: Efficient membership and set algebra.
- **String Parsing & Transformation**:
- `UnsafeStringFromBytes([]byte) string`: Zero-copy conversion (⚠️ unsafe; use only when memory safety is externally guaranteed).
- `FirstWord(s string)`: Extract first non-whitespace token.
- `(AsciiSet).FirstWord(...) (string, error)`: Same as above but validates characters against a restriction set.
- `TrimLeft(s string)` (via method on *AsciiSet): Remove leading whitespace using space-aware logic.
- `LeftSplitInTwo(s string, sep byte)`: Split at first occurrence of a separator.
- `RightSplitInTwo(s string, sep byte)`: Split at last occurrence.
### Design Goals
- **Performance**: Avoid allocations where possible (e.g., `unsafe.String`, direct indexing).
- **Simplicity**: Focused on ASCII-only operations for speed and predictability.
- **Safety Trade-offs**: `UnsafeStringFromBytes` trades safety for efficiency; other functions are safe and bounds-checked.
Intended use: embedded systems, parsers, or performance-critical text processing where standard library overhead is undesirable.
+24
View File
@@ -0,0 +1,24 @@
# `TarFileReader` — Semantic Description
The function `TarFileReader`, defined in the Go package `obiutils`, provides a targeted extraction capability for files within a TAR archive.
- **Input**:
- `file`: A generic reader (`*Reader`) implementing the standard Go `io.Reader` interface — typically wrapping an archive file or stream.
- `path`: A string specifying the *exact* path (relative to archive root) of the desired file inside the TAR.
- **Core Logic**:
- Instantiates a `tar.Reader` from the provided input stream.
- Iterates sequentially over TAR entries using `Next()`.
- Compares each entrys header name (`header.Name`) with the requested `path`.
- **Output**:
- On match: Returns a pointer to the *current* `tar.Reader`, positioned at the start of the requested files content (ready for subsequent reads).
- On failure: Returns `nil` and a formatted error `"file not found: <path>"`.
- **Semantics**:
- Enables *lazy*, on-demand access to a specific file inside a TAR archive — without decompressing the entire structure.
- Assumes exact path matching (no globbing, wildcards, or directory traversal).
- Does *not* handle symbolic links, hardlinks, or nested archives — only plain file entries.
- **Use Case**:
Ideal for lightweight tools that need to inspect or extract a single known file from large TAR archives (e.g., config files, manifests), minimizing memory and I/O overhead.
+25
View File
@@ -0,0 +1,25 @@
# `obiutils`: Unsafe StringByte Conversions in Go
This package provides low-level, zero-copy utilities for converting between `string` and `[]byte` in Go using the `unsafe` package.
## Core Functions
- **`UnsafeBytes(str string) []byte`**
Converts a `string` to a mutable byte slice **without copying**, by directly accessing the underlying memory.
⚠️ *Unsafe*: Modifications to the returned slice may corrupt or alter the original string (undefined behavior).
Use only when performance is critical and immutability can be guaranteed.
- **`UnsafeString(b []byte) string`**
Converts a `[]byte` to an immutable `string`, again **without copying**, by reinterpreting the byte slices memory as a string.
⚠️ *Unsafe*: If `b` is later modified, the resulting string may become invalid (memory safety violation).
Requires that `b` remains immutable for the lifetime of the returned string.
## Semantic Purpose
These functions enable high-performance interop between strings and byte slices—critical in systems programming, serialization frameworks, or memory-constrained environments where allocation overhead must be avoided.
## Risks & Best Practices
- **Never mutate the returned slice or original input after conversion**.
- Prefer standard conversions (`[]byte(s)`, `string(b)`) unless profiling confirms a measurable bottleneck.
- Ensure inputs are valid and owned (e.g., not shared across goroutines without synchronization).
+38
View File
@@ -0,0 +1,38 @@
# `obiutils` — Universal File I/O with Transparent Compression Support
The `xopen`-based package in the `obiutils` module provides a unified interface for reading and writing files, streams, HTTP resources, or command outputs—**transparently handling multiple compression formats**: gzip, xz, zstd, and bzip2.
## Key Functionalities
- **`Ropen(f string)`**
Opens a file, stdin (`"-"`), HTTP(S) URL, or shell command (e.g., `"|gzip -dc file.gz"`) for **buffered reading**, auto-detecting compression via magic bytes.
- **`Wopen(f string)` / `WopenFile(...)`**
Opens a file or stdout (`"-"`) for **buffered writing**, automatically compressing output based on extension (`.gz`, `.xz`, `.zst`, `.bz2`).
- **Compression Detection**
Functions like `IsGzip()`, `IsXz()`, `IsZst()`, and `IsBzip2()` inspect the first bytes of a buffered reader to infer format.
- **Path Utilities**
- `ExpandUser(path)` expands POSIX-style paths (`~`, `~/path`) to absolute ones.
- `Exists(path)` checks file existence after user expansion.
- **Error Handling**
Defines semantic errors: `ErrNoContent`, `ErrDirNotSupported`.
- **Buffered IO**
All readers/writers use a default buffer size of `65,536` bytes for performance.
- **Resource Management**
`Close()` methods ensure proper cleanup of underlying readers/writers and compression streams.
## Supported Sources & Formats
| Source | Format(s) |
|-------------------|------------------------|
| Local files | plain, `.gz`, `.xz`, `.zst`, `.bz2` |
| Stdin (`"-"`) | auto-detected |
| HTTP(S) URLs | transparent decompression on stream read |
| Pipe commands (`"|cmd"`) | output piped and auto-decompressed |
This abstraction simplifies bioinformatics or data-processing pipelines where input sources vary widely, and compression is common.
+19
View File
@@ -0,0 +1,19 @@
# `obiutils` Package: Semantic Overview
The `xopen.go` test suite (via GoCheck) validates utility functions for flexible file/stream I/O in Go. Key features:
- **`IsGzip()`**: Detects gzip compression by inspecting the first two bytes (`0x1f 0x8b`) of a `bufio.Reader`.
- **`Ropen()`**: Unified reader opener supporting:
- Local files (plain or `.gz`)
- Standard input (`"-"`) — *note: currently unimplemented in tests*
- HTTP(S) URLs (via `net/http`)
- **`Wopen()`**: Unified writer opener for:
- Local files (`".gz"` triggers gzip compression)
- Standard output via `"-"`
- **`Exists()`**: Checks file/directory existence (supports `~` expansion).
- **`ExpandUser()`**: Expands shell-like paths (`~/...`) to absolute ones.
- **Tested robustness**:
- Handles missing files, invalid URLs (404), and malformed paths.
- Validates gzip detection accuracy on both plain and compressed data.
All operations abstract away compression/format details, enabling uniform read/write semantics across local files, pipes (commented out), and remote HTTP resources.