mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,41 @@
|
||||
# `obiutils.Abs` — Generic Absolute Value Function
|
||||
|
||||
This package provides a **type-generic utility function** for computing the absolute value of signed numeric types in Go.
|
||||
|
||||
## Function Signature
|
||||
|
||||
```go
|
||||
func Abs[T constraints.Signed](x T) T
|
||||
```
|
||||
|
||||
- **Generic constraint**: `T` must satisfy `constraints.Signed`, i.e., any signed integer type (`int`, `int8`–`int64`) or floating-point type (via future Go versions supporting floats in `constraints.Signed`).
|
||||
- **Input**: A value of type `T`.
|
||||
- **Output**: The absolute (non-negative) counterpart, same type as input.
|
||||
|
||||
## Semantics
|
||||
|
||||
- Returns `x` if `x ≥ 0`.
|
||||
- Otherwise, returns `-x`, effectively flipping the sign.
|
||||
- Handles all signed numeric types uniformly — no need for type-specific overloads.
|
||||
|
||||
## Example Usage
|
||||
|
||||
```go
|
||||
absInt := obiutils.Abs(-5) // → 5 (type: int)
|
||||
absFloat64 := obiutils.Abs(-3.14) // → 3.14 (type: float64)
|
||||
```
|
||||
|
||||
## Design Rationale
|
||||
|
||||
- Leverages Go generics for **reusability** and type safety.
|
||||
- Avoids duplication across `AbsInt`, `AbsFloat64`, etc.
|
||||
- Follows Go’s standard library conventions (e.g., similar to `math.Abs` but *generic* and not limited to floats).
|
||||
|
||||
## Limitations
|
||||
|
||||
- Does **not** support unsigned types (by design: `constraints.Signed` excludes them).
|
||||
- For floating-point special cases (`NaN`, `-0.0`) behavior matches native negation semantics.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- Requires `golang.org/x/exp/constraints` for the generic type constraint.
|
||||
@@ -0,0 +1,23 @@
|
||||
## Semantic Description of `obiutils.Abs` Functionality
|
||||
|
||||
The provided Go test suite (`TestAbs`) validates the semantic behavior of a utility function `Abs` from the package [`obiutils`](https://git.metabarcoding.org/obitools/obitools4), part of the OBITools 4 ecosystem — a toolkit for DNA metabarcoding data analysis.
|
||||
|
||||
- **Function Purpose**:
|
||||
`obiutils.Abs` computes the *absolute value* of an integer, returning its non-negative magnitude regardless of sign.
|
||||
|
||||
- **Test Coverage**:
|
||||
The test verifies correctness across two categories:
|
||||
- *Non-negative inputs* (`0`, `1`, `5`, `10`) → outputs unchanged.
|
||||
- *Negative inputs* (`-1`, `-5`, `-10`) → outputs their positive counterparts.
|
||||
|
||||
- **Semantic Semantics**:
|
||||
The function adheres to the mathematical definition: `Abs(x) = x` if `x ≥ 0`, else `-x`.
|
||||
It ensures robustness for edge cases (e.g., zero) and typical integer ranges used in bioinformatic pipelines.
|
||||
|
||||
- **Integration Context**:
|
||||
As part of `obitools4`, such low-level utilities likely support numerical operations in sequence alignment scoring, quality filtering, or coordinate transformations — where signed differences must be normalized.
|
||||
|
||||
- **Test Quality**:
|
||||
Uses table-driven testing (Go idiom), promoting maintainability and clarity. No external dependencies are required — confirming the function is pure, deterministic, and self-contained.
|
||||
|
||||
In summary: `Abs` provides a foundational arithmetic primitive with guaranteed correctness for integer inputs, enabling reliable downstream computation in OBITools’ data processing workflows.
|
||||
@@ -0,0 +1,23 @@
|
||||
# `obiutils` Package: Semantic Overview
|
||||
|
||||
This Go package (`obiutils`) provides generic utilities for numerical and matrix operations, leveraging generics (Go 1.18+). It defines foundational types and helper functions for working with multidimensional data structures.
|
||||
|
||||
- **Type Interfaces**
|
||||
- `Integer`: Constraint covering signed integer types (`int`, `int8`–`int64`).
|
||||
- `Float`: Constraint for floating-point types (`float32`, `float64`).
|
||||
- `Numeric`: Union of both, enabling generic numeric functions.
|
||||
|
||||
- **Data Structures**
|
||||
- `Vector[T]`: A slice-based vector (`[]T`).
|
||||
- `Matrix[T]`: A row-major representation of a 2D matrix (`[][]T`), backed by contiguous memory for performance.
|
||||
|
||||
- **Core Functions**
|
||||
- `Make2DArray[T]`: Allocates a zero-initialized, contiguous-row-major matrix of arbitrary type `T`.
|
||||
- `Make2DNumericArray[T]`: Same as above, but restricted to numeric types; optionally pre-fills with zeros if `zeroed=true`.
|
||||
|
||||
- **Matrix Methods**
|
||||
- `.Column(i int)`: Extracts column `i` as a slice (not row-wise access).
|
||||
- `.Rows(i ...int)`: Returns a new matrix containing only the specified row indices.
|
||||
- `.Dim() (int, int)`: Returns `(rows, cols)` safely handling `nil` or empty matrices.
|
||||
|
||||
The design prioritizes memory efficiency (via contiguous backing arrays), type safety through generics, and ergonomic access patterns for linear algebra-like workflows.
|
||||
@@ -0,0 +1,15 @@
|
||||
# Semantic Description of `obiutils` Matrix Functionality
|
||||
|
||||
The `package obiutils` provides a generic, type-safe matrix abstraction in Go with core utility methods for construction and querying.
|
||||
|
||||
- **`Make2DArray[T]()`**: A generic constructor that initializes a 2D slice (matrix) of type `Matrix[T]`, with specified numbers of rows and columns. All elements are zero-initialized (e.g., `0` for integers, empty string for strings, default struct values).
|
||||
|
||||
- **`.Column(colIndex int)`**: Extracts and returns a single column (as `[]T`) from the matrix at the given 0-based index, preserving element order across rows.
|
||||
|
||||
- **`.Rows(indices ...int)`**: Returns a new matrix composed of only the specified row indices (0-based), supporting single-row, multi-row, or empty selections.
|
||||
|
||||
- **`.Dim() (rows, cols int)`**: Returns the dimensions of the matrix as `(number_of_rows, number_of_columns)`. Handles edge cases: `nil`, empty (`{}`), and jagged or zero-column matrices safely (e.g., `{ { } }` yields `(1, 0)`).
|
||||
|
||||
All functionality is implemented as methods on the `Matrix[T]` type (implicitly defined via slices of slices), leveraging Go generics for compile-time safety and runtime efficiency.
|
||||
|
||||
The package includes comprehensive unit tests validating correctness across types (`int`, `string`, custom structs) and boundary conditions.
|
||||
@@ -0,0 +1,27 @@
|
||||
# `InPlaceToLower` Function — Semantic Description
|
||||
|
||||
The `obiutils.InPlaceToLower` function provides a high-performance, memory-efficient utility for converting ASCII uppercase letters to lowercase **in place**, without allocating new data structures.
|
||||
|
||||
## Core Functionality
|
||||
- Takes a `[]byte` slice (`data`) as input.
|
||||
- Iterates over each byte, identifying uppercase ASCII characters (i.e., `'A'`–`'Z'`, values `65`–`90`).
|
||||
- Converts each uppercase byte to its lowercase counterpart using a bitwise OR with `32`, leveraging the ASCII encoding property:
|
||||
`lowercase = uppercase | 0b0010_0000` (since `'a' - 'A' = 32`).
|
||||
- Returns the **same** `[]byte` slice, now mutated in-place.
|
||||
|
||||
## Key Characteristics
|
||||
- ✅ **Zero-copy**: No new memory is allocated—ideal for performance-critical or low-level contexts (e.g., streaming, embedded systems).
|
||||
- ✅ **ASCII-safe**: Only modifies bytes in the `'A'`–`'Z'` range; other bytes (e.g., digits, symbols, non-ASCII) remain unchanged.
|
||||
- ✅ **Idiomatic Go**: Uses idioms like `range` with index/value and bitwise optimization.
|
||||
- ⚠️ **Destructive**: Input data is permanently modified—callers must clone if preservation is needed.
|
||||
|
||||
## Use Cases
|
||||
- Preprocessing raw HTTP headers or payloads.
|
||||
- Optimizing case-insensitive comparisons in high-throughput systems.
|
||||
- Embedded tools where GC pressure or heap allocation must be minimized.
|
||||
|
||||
## Example
|
||||
```go
|
||||
buf := []byte("HTTP/1.1 200 OK")
|
||||
InPlaceToLower(buf) // buf is now []byte("http/1.1 200 ok")
|
||||
```
|
||||
@@ -0,0 +1,20 @@
|
||||
# `obiutils` Package Functional Overview
|
||||
|
||||
The `obiutils` package provides two core utility functions for low-level and numerical operations in Go:
|
||||
|
||||
- **`InPlaceToLower([]byte) []byte`**
|
||||
Converts all ASCII uppercase letters in a byte slice to lowercase *in-place*, returning the modified slice.
|
||||
- Non-alphabetic bytes remain unchanged.
|
||||
- Memory-efficient: modifies input directly (no allocation of new slice).
|
||||
|
||||
- **`Make2DNumericArray[T any](rows, cols int, zeroed bool) Matrix[T]`**
|
||||
Generates a generic 2D numeric array (`Matrix`) of type `T`, supporting any comparable/numeric Go type.
|
||||
- Parameters: number of rows, columns, and whether to initialize with zero values (`true`) or default `T` (e.g., 0 for int).
|
||||
- Uses Go generics (`[T any]`) for type safety and flexibility.
|
||||
|
||||
Both functions are thoroughly unit-tested in `*_test.go`, covering edge cases:
|
||||
- Empty/nil inputs (`InPlaceToLower`)
|
||||
- Various dimensions and zero-initialization modes (`Make2DNumericArray`)
|
||||
|
||||
Tests use `reflect.DeepEqual` for structural comparison and subtests via `t.Run`.
|
||||
The package assumes a custom type alias: `type Matrix[T any] [][]T`.
|
||||
@@ -0,0 +1,15 @@
|
||||
# `obiutils` Package — Semantic Feature Summary
|
||||
|
||||
This Go package provides a set of utility functions for **type conversion**, **casting validation**, and **map/slice transformation** in a flexible, error-tolerant manner.
|
||||
|
||||
- `InterfaceToString(i interface{})`: Converts any value to a string, preferring the `Stringer` interface if implemented.
|
||||
- `CastableToInt(i interface{})`: Checks whether a value is *numerically castable* to an `int` (supports all numeric types).
|
||||
- `InterfaceToBool(i interface{})`: Safely converts various input types (`bool`, numeric, string like `"true"`, `"1"`, etc.) to `bool`; returns a custom error for unsupported types.
|
||||
- `InterfaceToInt(i interface{})`: Converts numeric or string representations to an `int`, with precise error handling.
|
||||
- `InterfaceToFloat64(i interface{})`: Converts numeric or string types to `float64`, using standard parsing.
|
||||
- `MapToMapInterface(m interface{})`: Converts specialized map types (e.g., read-only or concurrency-safe maps) to `map[string]interface{}` via reflection.
|
||||
- `InterfaceToIntMap(i interface{})`: Converts compatible map types (`map[string]int`, `hasMap` interfaces, or generic maps) to a concrete `map[string]int`.
|
||||
- `InterfaceToStringMap(i interface{})`: Converts map values to strings, yielding a clean `map[string]string`.
|
||||
- `InterfaceToStringSlice(i interface{})`: Converts slices of interfaces or strings into a pure `[]string`.
|
||||
|
||||
All functions include **explicit error handling** via custom types (e.g., `NotAnInteger`, `NotAMapInt`) and use logging via Logrus for debugging. The package prioritizes **type safety**, **robustness**, and interoperability across Go types.
|
||||
@@ -0,0 +1,25 @@
|
||||
# `obiutils.Counter`: Thread-Safe Atomic Counter
|
||||
|
||||
A minimal, thread-safe counter implementation in Go.
|
||||
|
||||
## Features
|
||||
- **Atomic increment/decrement**: `Inc()` and `Dec()` modify the internal counter atomically using a mutex.
|
||||
- **Current value retrieval**: `Value()` safely returns the current count without modifying it.
|
||||
- **Initial value support**: Constructor accepts an optional initial integer (defaults to `0`).
|
||||
- **Closure-based API**: Encapsulates state and synchronization behind clean, functional methods.
|
||||
- **No external dependencies**: Uses only the standard library (`sync`).
|
||||
|
||||
## Usage Example
|
||||
```go
|
||||
counter := obiutils.NewCounter(10) // start at 10
|
||||
fmt.Println(counter.Inc()) // → 11
|
||||
fmt.Println(counter.Dec()) // → 10
|
||||
fmt.Println(counter.Value()) // → 10 (unchanged)
|
||||
```
|
||||
|
||||
## Thread Safety
|
||||
All operations are protected by a `sync.Mutex`, ensuring correctness in concurrent environments.
|
||||
|
||||
## Design Notes
|
||||
- Immutable interface: methods return updated values, not pointers.
|
||||
- No reset method provided—intentionally minimal and focused on core counting semantics.
|
||||
@@ -0,0 +1,37 @@
|
||||
## `obiutils.DownloadFile` — Semantic Feature Overview
|
||||
|
||||
- **Core Functionality**: Downloads a file from a given URL to a specified local path.
|
||||
|
||||
- **HTTP Client Behavior**:
|
||||
- Uses `http.Get()` for simple, synchronous GET requests.
|
||||
- Validates the HTTP status code; aborts on non-200 responses with a descriptive error.
|
||||
|
||||
- **Resource Management**:
|
||||
- Ensures proper cleanup via `defer resp.Body.Close()` and `defer out.Close()`.
|
||||
|
||||
- **Progress Tracking**:
|
||||
- Integrates [`progressbar`](https://github.com/schollz/progressbar) to display real-time download progress.
|
||||
- Uses `DefaultBytes()` for a human-readable, byte-based indicator (e.g., "downloading 12.3 MB / 45.6 MB").
|
||||
|
||||
- **Efficient I/O**:
|
||||
- Leverages `io.Copy()` with an `io.MultiWriter` to stream data directly from the HTTP response body into both:
|
||||
- The target file (`out`)
|
||||
- The progress bar (to update on each chunk written)
|
||||
|
||||
- **Error Handling**:
|
||||
- Returns early with wrapped errors for network failures, HTTP non-success codes, or file I/O issues.
|
||||
|
||||
- **Simplicity & Usability**:
|
||||
- Minimal API surface: only two arguments (`url`, `filepath`).
|
||||
- No external configuration needed — ideal for CLI tools or batch scripts.
|
||||
|
||||
- **Assumptions**:
|
||||
- No authentication, redirects, proxies, timeouts, or retries are implemented.
|
||||
- Designed for straightforward downloads where robustness is secondary to simplicity.
|
||||
|
||||
- **Typical Use Cases**:
|
||||
- CLI utilities, build scripts, CI/CD pipelines.
|
||||
- Prototyping or internal tools where advanced download features are unnecessary.
|
||||
|
||||
- **Limitations**:
|
||||
- Not suitable for large-scale or production-grade downloads without enhancements (e.g., retries, concurrency control).
|
||||
@@ -0,0 +1,16 @@
|
||||
# `obiutils` Package Overview
|
||||
|
||||
This Go package provides utility functions for common data conversion, serialization, and reflection tasks.
|
||||
|
||||
- **Custom Error Types**: Defines typed errors (`NotAnInteger`, `NotAFloat64`, etc.) for precise type validation failures.
|
||||
- **Interface-to-Type Casting**: Offers robust conversion functions:
|
||||
- `InterfaceToFloat64Map`, `InterfaceToIntSlice`, etc., handling nested interfaces and type coercion (e.g. `int` → `float64`, slices of `interface{}`).
|
||||
- **File I/O**: `ReadLines` reads a file line-by-line into a string slice, handling buffered reading efficiently.
|
||||
- **Concurrency**: `AtomicCounter` returns an incrementing integer generator—thread-safe via mutex, optionally starting from a given value.
|
||||
- **JSON Serialization**: `JsonMarshal` and `JsonMarshalByteBuffer` provide UTF‑8–preserving JSON encoding (avoids Go’s default HTML escaping).
|
||||
- **Reflection Helpers**:
|
||||
- `IsAMap`, `IsASlice`, `IsAnArray` detect container types.
|
||||
- `HasLength`, `Len`, and `IsAContainer` abstract length operations across maps, slices, arrays, or custom types with a `.Len()` method.
|
||||
- **Deep Copying**: `MustFillMap` performs deep copying of nested structures using `go-deepcopy`.
|
||||
|
||||
All functions prioritize safety, type correctness, and usability in data-heavy or concurrent applications.
|
||||
@@ -0,0 +1,37 @@
|
||||
# `obiutils` Package: File and Stream Writing Utilities
|
||||
|
||||
The `obiutils` package provides a unified abstraction for writing data to files or streams, with optional gzip compression and buffered I/O.
|
||||
|
||||
## Core Type: `Wfile`
|
||||
|
||||
- Encapsulates a write-ready output stream (`io.WriteCloser`).
|
||||
- Supports both **compressed** (gzip) and uncompressed modes.
|
||||
- Uses `bufio.Writer` for efficient buffered writes.
|
||||
|
||||
## Key Functions
|
||||
|
||||
### `OpenWritingFile(name string, compressed bool, append bool) (*Wfile, error)`
|
||||
- Opens a file for writing.
|
||||
- `compressed`: enables gzip compression via `pgzip`.
|
||||
- `append`: if true, writes at end of file (`os.O_APPEND`).
|
||||
- Returns a ready-to-use `*Wfile`.
|
||||
|
||||
### `CompressStream(out io.WriteCloser, compressed bool, close bool) (*Wfile, error)`
|
||||
- Wraps an arbitrary `io.WriteCloser` (e.g., HTTP response, pipe) in buffered/compressed I/O.
|
||||
- `close`: if true, the underlying writer is closed on `.Close()`.
|
||||
|
||||
## Methods
|
||||
|
||||
- **`Write(p []byte)` / `WriteString(s string)`**:
|
||||
Buffered writes to the underlying stream (transparently compressed if enabled).
|
||||
|
||||
- **`Close()`**:
|
||||
- Flushes the buffer.
|
||||
- Closes gzip writer (if compressed).
|
||||
- Closes underlying file/stream *only if* `close == true`.
|
||||
|
||||
## Design Highlights
|
||||
|
||||
- **Transparent compression**: Uses high-performance `pgzip` for parallel gzip.
|
||||
- **Resource control**: Explicit flag (`close`) prevents premature closure of shared writers (e.g., in pipelines).
|
||||
- **Efficiency**: Double buffering via `bufio.Writer` + gzip stream.
|
||||
@@ -0,0 +1,15 @@
|
||||
# `obiutils` Package: Memory Size Parsing and Formatting
|
||||
|
||||
This Go package provides two complementary utility functions for handling human-readable memory sizes:
|
||||
|
||||
- **`ParseMemSize(s string) (int, error)`**
|
||||
Parses a memory size string into an integer number of bytes. Supports case-insensitive units: `B`, `K`/`KB`, `M`/`MB`, `G`/`GB`, and `T`/`TB`.
|
||||
Examples: `"128K"` → `131072`, `"512MB"` → `536870912`.
|
||||
Returns an error for invalid input (e.g., empty string, non-numeric prefix, or unknown unit).
|
||||
|
||||
- **`FormatMemSize(n int) string`**
|
||||
Converts a byte count into the most appropriate human-readable format using powers of 1024.
|
||||
Uses suffixes `T`, `G`, `M`, or `K`; falls back to bytes (`B`) if < 1 KiB.
|
||||
Integers are displayed without decimals (e.g., `2048` → `"2K"`), while fractional values use one decimal (e.g., `1536` → `"1.5K"`).
|
||||
|
||||
Both functions ensure semantic clarity and consistency for memory-related I/O, logging, or configuration parsing.
|
||||
@@ -0,0 +1,32 @@
|
||||
# OBIMimeUtils: Semantic Description of Features
|
||||
|
||||
The `obiutils` Go package provides utilities for detecting and handling biological data file formats, primarily via MIME type inference.
|
||||
|
||||
## Core Functionalities
|
||||
|
||||
- **BOM Detection (`HasBOM`)**
|
||||
Identifies Byte Order Marks (BOMs) for UTF-8, UTF-16 BE/LE, and UTF-32 BE/LE encodings. Logs detected types for transparency.
|
||||
|
||||
- **Last-Line Trimming (`DropLastLine`)**
|
||||
Removes the final newline-delimited line from a byte slice — useful for sanitizing incomplete or truncated files.
|
||||
|
||||
- **MIME Type Registration (`RegisterOBIMimeType`)**
|
||||
Extends generic MIME types (e.g., `text/plain`, `application/octet-stream`) with format-specific detectors for:
|
||||
- **CSV**: Validates structured comma-separated data (≥2 fields, ≥2 lines).
|
||||
- **FASTA/FASTQ**: Regex-based detection of sequence headers (`>` or `@`).
|
||||
- **ecoPCR2**: Detects files starting with the magic header `#@ecopcr-v2`.
|
||||
- **GenBank/EMBL**: Checks for standard sequence record prefixes (`LOCUS`, `ID`).
|
||||
|
||||
- **Format-Specific Extensions**
|
||||
Registers custom MIME subtypes (e.g., `text/fasta`, `.fasta`) and associates them with appropriate file extensions.
|
||||
|
||||
- **Idempotent Registration**
|
||||
Ensures MIME detectors are registered only once using a guard flag.
|
||||
|
||||
## Design Goals
|
||||
|
||||
- Robust, lightweight format inference without full parsing.
|
||||
- Extensible architecture for future bioinformatics formats.
|
||||
- Logging-friendly (via `logrus`) to aid debugging and observability.
|
||||
|
||||
This package enables accurate, context-aware MIME detection in pipelines handling heterogeneous biological data.
|
||||
@@ -0,0 +1,36 @@
|
||||
# `obiutils` Package: Semantic Overview
|
||||
|
||||
The `obiutils` package provides generic and reflection-based utilities for computing minima and maxima across multiple data structures in Go.
|
||||
|
||||
## Core Features
|
||||
|
||||
- **Generic `MinMax` / `Min/MaxSlice`**:
|
||||
- `MinMax[T constraints.Ordered]`: Returns the ordered pair `(min, max)` of two values.
|
||||
- `MinMaxSlice[T constraints.Ordered]`: Finds min and max in a slice of ordered types (panics on empty input).
|
||||
|
||||
- **Map-based Min/Max**:
|
||||
- `MinMap` / `MaxMap`: Returns the key and value of the smallest/largest *value* in a map (errors on empty maps).
|
||||
|
||||
- **Unified `Min` / `Max` Functions**:
|
||||
- Accepts *any* Go value: single scalar, slice/array/map.
|
||||
- Uses reflection to dispatch logic based on runtime type (`reflect.Kind`).
|
||||
- Supports ordered kinds: integers, floats, strings (signed/unsigned ints via `constraints.Ordered` subset).
|
||||
- Returns an error for unsupported or empty containers.
|
||||
|
||||
- **Helper Reflection Functions**:
|
||||
- `minFromIterable` / `maxFromIterable`: Scan slices/arrays.
|
||||
- `minFromMap` / `maxFromMap`: Iterate over map values (ignores keys in comparisons).
|
||||
- `isOrderedKind`, `less`, `greater`: Internal comparison logic for reflection-based ordering.
|
||||
|
||||
## Design Highlights
|
||||
|
||||
- **Type Safety & Generics**: Leverages Go 1.18+ generics for compile-time type constraints where possible.
|
||||
- **Flexibility**: The `Min(data interface{})` / `Max(...)` functions allow a *single API* for heterogeneous inputs.
|
||||
- **Error Handling**: Explicit errors (e.g., `"empty slice"`, `"unsupported type"`), no panics for user-facing APIs.
|
||||
- **Fallback Support**: Checks if the input has a `Min()`/`Max()` method (via reflection) before falling back to generic logic.
|
||||
|
||||
## Limitations
|
||||
|
||||
- Reflection-based paths are slower than direct generics.
|
||||
- No support for custom types without ordering defined (e.g., structs unless they satisfy `constraints.Ordered`).
|
||||
- Maps compare only *values*; keys are irrelevant for min/max selection.
|
||||
@@ -0,0 +1,35 @@
|
||||
# `MinMultiset[T]` — A Lazy-Delete Min-Multiset Implementation
|
||||
|
||||
A generic, type-safe multiset data structure in Go that maintains elements with multiplicity and provides efficient access to the current minimum. Built on top of a min-heap (`container/heap`) with **lazy deletion** to support efficient removals without rebuilding the heap.
|
||||
|
||||
## Core Features
|
||||
|
||||
- ✅ **Generic over comparable types** (`T`) with custom ordering via `less` comparator
|
||||
- ✅ **Multiset semantics**: supports multiple occurrences of the same value
|
||||
- ✅ **O(log n) insertion** (`Add`) and **amortized O(1)** minimum access
|
||||
- ✅ **Lazy deletion**: `RemoveOne` marks items for removal; physical cleanup occurs on next `Min()` call
|
||||
- ✅ **Size tracking**: logical size (`Len()`) excludes deleted items, even if still in heap
|
||||
- ✅ **Memory-efficient cleanup**: `shrink()` and `cleanTop()` prevent tombstone accumulation
|
||||
|
||||
## API Summary
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `NewMinMultiset(less)` | Constructor; initializes heap, maps (`count`, `pending`), and sets ordering |
|
||||
| `Add(v)` | Inserts one occurrence of `v`; increments logical size & count map |
|
||||
| `RemoveOne(v)` | Removes *one* occurrence if present; returns success flag (`false` otherwise) |
|
||||
| `Min()` | Returns current minimum (or zero value + `ok=false`) after cleaning stale top entries |
|
||||
| `Len()` | Returns logical size (excludes pending deletions) |
|
||||
|
||||
## Internal Mechanism
|
||||
|
||||
- **`count[T]int`**: tracks how many times each value is *logically* present
|
||||
- **`pending[T]int`**: tracks how many times each value is *marked for removal*
|
||||
- **Heap invariant maintained only up to logical size** — stale entries are pruned lazily during `Min()` or after deletions
|
||||
- **No manual cleanup needed** — the structure self-balances incrementally
|
||||
|
||||
## Use Cases
|
||||
|
||||
Priority queues with deletable arbitrary elements (e.g., Dijkstra’s algorithm where distances are updated), sliding-window minima, event scheduling with cancellation.
|
||||
|
||||
> ⚠️ Note: `less` must define a *strict total order* (transitive, antisymmetric, connected) for correctness.
|
||||
@@ -0,0 +1,16 @@
|
||||
## `obiutils` Package: File Path Utility Functions
|
||||
|
||||
This Go package provides two utility functions for manipulating file paths by removing extensions:
|
||||
|
||||
### `RemoveAllExt(p string) string`
|
||||
- **Purpose**: Strips *all* file extensions from a given path (e.g., `/dir/file.tar.gz` → `/dir/file`).
|
||||
- **Mechanism**: Iteratively uses `path.Ext()` and `strings.TrimSuffix` to remove extensions from the *full path*, including directory components if they contain dots (though rare).
|
||||
- **Use Case**: Useful when you need to sanitize a full path for naming or comparison, regardless of extension stacking.
|
||||
|
||||
### `Basename(path string) string`
|
||||
- **Purpose**: Extracts the base filename *without any extensions* (e.g., `/dir/file.tar.gz` → `file`).
|
||||
- **Mechanism**: Uses `filepath.Base()` to get the filename, then iteratively strips extensions via `strings.TrimSuffix`.
|
||||
- **Key Difference**: Operates *only on the filename*, not directory parts — safer and more conventional for typical file handling.
|
||||
|
||||
Both functions handle multi-extension files (e.g., `.tar.gz`, `.backup.zip`) robustly. They avoid reliance on `strings.LastIndex` or regex, favoring clarity and standard library usage (`path`, `filepath`).
|
||||
Designed for portability across Unix-like systems (uses forward slashes), though Windows paths are supported via `filepath.Base`.
|
||||
@@ -0,0 +1,19 @@
|
||||
# `obiutils` Package: Functional Overview
|
||||
|
||||
The `obiutils` package provides utility functions for common file path manipulations in Go. Its current public API includes:
|
||||
|
||||
- **`RemoveAllExt(path string) string`**
|
||||
Strips *all* file extensions from a given path, returning the base name without any trailing suffixes (e.g., `.txt`, `.tar.gz`).
|
||||
- Handles paths with no extensions unchanged.
|
||||
- Correctly processes single- and multi-part (e.g., `.tar.gz`) extensions.
|
||||
- Designed for robustness across Unix-like and cross-platform path conventions.
|
||||
|
||||
The package currently includes a single unit test suite:
|
||||
|
||||
- **`TestRemoveAllExt(t *testing.T)`**
|
||||
Validates the correctness of `RemoveAllExt` using three test cases:
|
||||
• `"path/to/file"` → unchanged (`"path/to/file"`)
|
||||
• `"path/to/file.txt"` → stripped to `"/file"` (→ `"path/to/file"`)
|
||||
• `"path/to/file.tar.gz"` → fully stripped to `"/file"` (→ `"path/to/file"`)
|
||||
|
||||
This ensures reliable behavior for downstream code relying on extension-agnostic path handling—e.g., in build systems, data pipelines, or file-processing tools.
|
||||
@@ -0,0 +1,36 @@
|
||||
# `obiutils` Package: Pipe Synchronization Utilities
|
||||
|
||||
This Go package provides lightweight synchronization primitives for managing concurrent pipeline execution, particularly useful in CLI or batch-processing applications.
|
||||
|
||||
## Core Components
|
||||
|
||||
- **`globalLocker`:** A `sync.WaitGroup` tracking active pipeline goroutines.
|
||||
- **`globalLockerCounter`:** An integer counter for logging/debugging the number of active pipes.
|
||||
|
||||
## Public Functions
|
||||
|
||||
### `RegisterAPipe()`
|
||||
- Increments both the WaitGroup and counter.
|
||||
- Logs current count at debug level (`log.Debugln`).
|
||||
- Typically called when starting a new pipeline stage or goroutine.
|
||||
|
||||
### `UnregisterPipe()`
|
||||
- Decrements the WaitGroup and counter.
|
||||
- Logs updated count at debug level.
|
||||
- Should be invoked when a pipeline finishes (e.g., `defer UnregisterPipe()`).
|
||||
|
||||
### `WaitForLastPipe()`
|
||||
- Blocks until all registered pipes complete (`globalLocker.Wait()`).
|
||||
- Intended to be called at the end of `main()`, ensuring graceful shutdown.
|
||||
|
||||
## Semantic Use Case
|
||||
|
||||
Enables safe, concurrent execution of multiple independent pipelines (e.g., data processing stages), ensuring the program waits for all to finish before exiting — without explicit channel or mutex management.
|
||||
|
||||
## Design Notes
|
||||
|
||||
- **Thread-safe** via `sync.WaitGroup`.
|
||||
- **Minimalist**: No error handling; assumes correct usage.
|
||||
- **Logging-focused** for observability in development/debug builds.
|
||||
|
||||
> ⚠️ Not production-ready without additional safeguards (e.g., panic recovery, timeout support).
|
||||
@@ -0,0 +1,35 @@
|
||||
# `obiutils` — Semantic Description of Core Functionality
|
||||
|
||||
This Go package provides generic and type-specific utilities for **ranking** and **ordering** data without modifying the original slice. It leverages Go’s `sort` package to compute index permutations that reflect sorted order.
|
||||
|
||||
## Key Components
|
||||
|
||||
- **IntOrder(data []int) []int**
|
||||
Returns indices that would sort a slice of integers in *ascending* order. The original data remains unchanged.
|
||||
|
||||
- **ReverseIntOrder(data []int) []int**
|
||||
Same as `IntOrder`, but returns indices for *descending* order.
|
||||
|
||||
- **Order[T sort.Interface](data T) []int**
|
||||
Generic version accepting any type implementing `sort.Interface`. Returns stable sorted indices.
|
||||
|
||||
## Internal Design
|
||||
|
||||
- **intRanker** and **Ranker[T]**: Helper types wrapping data + index list (`r`).
|
||||
They implement `sort.Interface` *indirectly*—sorting indices instead of mutating data.
|
||||
|
||||
- **Index-based sorting**:
|
||||
By permuting a list of indices (`r = [0,1,...]`), the original data is never copied or altered—ideal for large datasets or immutable inputs.
|
||||
|
||||
- **Stability**: `Order` uses `sort.Stable`, preserving relative order of equal elements.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Sorting metadata (e.g., sorting labels by associated scores).
|
||||
- Preparing orderings for downstream operations (plots, ranking metrics).
|
||||
- Efficiently tracking original positions after sort.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Requires `sort.Interface` for generic version (e.g., custom structs with methods).
|
||||
- Returns empty slice (`nil`) on zero-length input.
|
||||
@@ -0,0 +1,34 @@
|
||||
# `obiutils.Set` — Generic Set Implementation in Go
|
||||
|
||||
This package provides a generic, type-safe set data structure for Go (1.20+), leveraging generics (`comparable` constraint). It supports common set operations with intuitive APIs.
|
||||
|
||||
## Core Features
|
||||
|
||||
- **Generic Type Support**: `Set[E]` works for any comparable type (e.g., `int`, `string`, custom structs with equality).
|
||||
- **Memory-Efficient Representation**: Implemented as a map from element to empty struct (`struct{}{}`), minimizing memory overhead.
|
||||
- **Immutability by Default**: Methods like `Union` and `Intersection` return *new* sets; in-place mutation is explicit (e.g., via `Add()`).
|
||||
|
||||
## Key Functions & Methods
|
||||
|
||||
| Function/Method | Description |
|
||||
|-----------------|-------------|
|
||||
| `MakeSet[E](vals ...E)` | Creates and returns a new set populated with given values. |
|
||||
| `NewSet[E](vals ...E)` | Same as `MakeSet`, but returns a pointer (`*Set[E]`). |
|
||||
| `(s Set[E]) Add(vals ...E)` | Inserts one or more elements into the set (in-place). |
|
||||
| `(s Set[E]) Contains(v E) bool` | Checks membership of an element. O(1). |
|
||||
| `(s Set[E]) Members() []E` | Returns all elements as a slice (order not guaranteed). |
|
||||
| `(s Set[E]) String() string` | Human-readable representation via `fmt.Sprintf`. |
|
||||
| `(s Set[E]) Union(s2 Set[E])` | Returns a new set containing elements from both sets. |
|
||||
| `(s Set[E]) Intersection(s2 Set[E])` | Returns a new set with elements common to both sets. |
|
||||
|
||||
## Example Usage
|
||||
|
||||
```go
|
||||
s1 := obiutils.MakeSet(1, 2, 3)
|
||||
s2 := obiutils.NewSet("a", "b")
|
||||
fmt.Println(s1.Contains(2)) // true
|
||||
union := s1.Union(MakeSet(3, 4))
|
||||
fmt.Println(union.Members()) // e.g., [1 2 3 4]
|
||||
```
|
||||
|
||||
> Designed for clarity, performance, and idiomatic Go usage.
|
||||
@@ -0,0 +1,36 @@
|
||||
# `obiutils` Package: Set Implementation in Go
|
||||
|
||||
The `obiutils` package provides a generic, type-safe set data structure for Go (v1.18+), along with comprehensive unit tests.
|
||||
|
||||
## Core Features
|
||||
|
||||
- **Generic Set Type**: Implemented as `Set[T]`, using a map for O(1) membership checks.
|
||||
- **Constructors**:
|
||||
- `MakeSet[T](...T)` returns a new set populated with given elements.
|
||||
- `NewSet[T]()` allocates an empty pointer to a set; useful for dynamic initialization.
|
||||
- **Methods**:
|
||||
- `Add(...T)` inserts one or more elements (idempotent).
|
||||
- `Contains(T) bool` checks membership.
|
||||
- `Members() []T` returns a sorted slice of elements (deterministic iteration).
|
||||
- `String() string` provides human-readable representation (`[a b c]` format).
|
||||
- **Set Operations**:
|
||||
- `Union(other Set[T]) Set[T]`: returns a new set with elements in either operand.
|
||||
- `Intersection(other Set[T]) Set[T]`: returns a new set with elements common to both.
|
||||
|
||||
## Test Coverage
|
||||
|
||||
Unit tests validate:
|
||||
- Set creation (empty, single/multiple values).
|
||||
- Element addition and membership.
|
||||
- String formatting for various sizes.
|
||||
- Correctness of union/intersection across edge cases (empty sets, disjoint/common elements).
|
||||
|
||||
All tests use `reflect.DeepEqual` for precise structural comparison and sort outputs where order is non-deterministic.
|
||||
|
||||
## Design Notes
|
||||
|
||||
- Immutable operations: methods return *new* sets rather than mutating in-place.
|
||||
- No duplicate support (standard set semantics).
|
||||
- Efficient storage via Go maps; no external dependencies.
|
||||
|
||||
> **Note**: This is a minimal, idiomatic set implementation—ideal for utility or testing contexts.
|
||||
@@ -0,0 +1,22 @@
|
||||
# `obiutils` Package Overview
|
||||
|
||||
The `obiutils` package provides generic, reusable utility functions for common slice operations in Go.
|
||||
|
||||
- **`Contains[T comparable](arr []T, x T) bool`**
|
||||
Checks whether a given element exists in the slice. Uses generic type `T`, requiring only that it supports equality comparison.
|
||||
|
||||
- **`LookFor[T comparable](arr []T, x T) int`**
|
||||
Returns the index of the *first* occurrence of `x`, or `-1` if not found. Also generic over comparable types.
|
||||
|
||||
- **`RemoveIndex[T comparable](s []T, index int) []T`**
|
||||
Removes the element at `index`, returning a new slice. Works in O(1) time (amortized), using `append` to rebuild the slice.
|
||||
|
||||
- **`Reverse[S ~[]E, E any](s S, inplace bool) S`**
|
||||
Reverses the slice elements. If `inplace = true`, modifies the original; otherwise, copies first and returns a reversed copy. Uses type constraint `~[]E` for flexibility across slice aliases.
|
||||
|
||||
All functions are designed to be:
|
||||
- Type-safe via Go generics (no reflection),
|
||||
- Efficient and idiomatic,
|
||||
- Well-documented with clear parameter/return semantics.
|
||||
|
||||
Ideal for use in data processing, validation logic, or general-purpose slice manipulation.
|
||||
@@ -0,0 +1,33 @@
|
||||
# `obiutils` Package Overview
|
||||
|
||||
The `obiutils` package provides low-level, high-performance utilities for ASCII string and set manipulation in Go.
|
||||
|
||||
### Core Components
|
||||
|
||||
- **`AsciiSet[256]bool`**: A compact boolean lookup table for ASCII characters (0–127), optimized for membership tests.
|
||||
- **Predefined Sets**:
|
||||
- `AsciiSpaceSet`: Whitespace characters (`\t\n\v\f\r `)
|
||||
- `AsciiDigitSet`, `AsciiUpperSet`, `AsciiLowerSet`
|
||||
- Derived sets: `Alpha` (letters), `Alnum` (alphanumeric)
|
||||
|
||||
### Key Functions
|
||||
|
||||
- **Set Operations**:
|
||||
- `AsciiSetFromString(s string)`: Build a set from characters in a literal.
|
||||
- `.Contains(c byte)` / `.Union()` / `.Intersect()`: Efficient membership and set algebra.
|
||||
|
||||
- **String Parsing & Transformation**:
|
||||
- `UnsafeStringFromBytes([]byte) string`: Zero-copy conversion (⚠️ unsafe; use only when memory safety is externally guaranteed).
|
||||
- `FirstWord(s string)`: Extract first non-whitespace token.
|
||||
- `(AsciiSet).FirstWord(...) (string, error)`: Same as above but validates characters against a restriction set.
|
||||
- `TrimLeft(s string)` (via method on *AsciiSet): Remove leading whitespace using space-aware logic.
|
||||
- `LeftSplitInTwo(s string, sep byte)`: Split at first occurrence of a separator.
|
||||
- `RightSplitInTwo(s string, sep byte)`: Split at last occurrence.
|
||||
|
||||
### Design Goals
|
||||
|
||||
- **Performance**: Avoid allocations where possible (e.g., `unsafe.String`, direct indexing).
|
||||
- **Simplicity**: Focused on ASCII-only operations for speed and predictability.
|
||||
- **Safety Trade-offs**: `UnsafeStringFromBytes` trades safety for efficiency; other functions are safe and bounds-checked.
|
||||
|
||||
Intended use: embedded systems, parsers, or performance-critical text processing where standard library overhead is undesirable.
|
||||
@@ -0,0 +1,24 @@
|
||||
# `TarFileReader` — Semantic Description
|
||||
|
||||
The function `TarFileReader`, defined in the Go package `obiutils`, provides a targeted extraction capability for files within a TAR archive.
|
||||
|
||||
- **Input**:
|
||||
- `file`: A generic reader (`*Reader`) implementing the standard Go `io.Reader` interface — typically wrapping an archive file or stream.
|
||||
- `path`: A string specifying the *exact* path (relative to archive root) of the desired file inside the TAR.
|
||||
|
||||
- **Core Logic**:
|
||||
- Instantiates a `tar.Reader` from the provided input stream.
|
||||
- Iterates sequentially over TAR entries using `Next()`.
|
||||
- Compares each entry’s header name (`header.Name`) with the requested `path`.
|
||||
|
||||
- **Output**:
|
||||
- On match: Returns a pointer to the *current* `tar.Reader`, positioned at the start of the requested file’s content (ready for subsequent reads).
|
||||
- On failure: Returns `nil` and a formatted error `"file not found: <path>"`.
|
||||
|
||||
- **Semantics**:
|
||||
- Enables *lazy*, on-demand access to a specific file inside a TAR archive — without decompressing the entire structure.
|
||||
- Assumes exact path matching (no globbing, wildcards, or directory traversal).
|
||||
- Does *not* handle symbolic links, hardlinks, or nested archives — only plain file entries.
|
||||
|
||||
- **Use Case**:
|
||||
Ideal for lightweight tools that need to inspect or extract a single known file from large TAR archives (e.g., config files, manifests), minimizing memory and I/O overhead.
|
||||
@@ -0,0 +1,25 @@
|
||||
# `obiutils`: Unsafe String–Byte Conversions in Go
|
||||
|
||||
This package provides low-level, zero-copy utilities for converting between `string` and `[]byte` in Go using the `unsafe` package.
|
||||
|
||||
## Core Functions
|
||||
|
||||
- **`UnsafeBytes(str string) []byte`**
|
||||
Converts a `string` to a mutable byte slice **without copying**, by directly accessing the underlying memory.
|
||||
⚠️ *Unsafe*: Modifications to the returned slice may corrupt or alter the original string (undefined behavior).
|
||||
Use only when performance is critical and immutability can be guaranteed.
|
||||
|
||||
- **`UnsafeString(b []byte) string`**
|
||||
Converts a `[]byte` to an immutable `string`, again **without copying**, by reinterpreting the byte slice’s memory as a string.
|
||||
⚠️ *Unsafe*: If `b` is later modified, the resulting string may become invalid (memory safety violation).
|
||||
Requires that `b` remains immutable for the lifetime of the returned string.
|
||||
|
||||
## Semantic Purpose
|
||||
|
||||
These functions enable high-performance interop between strings and byte slices—critical in systems programming, serialization frameworks, or memory-constrained environments where allocation overhead must be avoided.
|
||||
|
||||
## Risks & Best Practices
|
||||
|
||||
- **Never mutate the returned slice or original input after conversion**.
|
||||
- Prefer standard conversions (`[]byte(s)`, `string(b)`) unless profiling confirms a measurable bottleneck.
|
||||
- Ensure inputs are valid and owned (e.g., not shared across goroutines without synchronization).
|
||||
@@ -0,0 +1,38 @@
|
||||
# `obiutils` — Universal File I/O with Transparent Compression Support
|
||||
|
||||
The `xopen`-based package in the `obiutils` module provides a unified interface for reading and writing files, streams, HTTP resources, or command outputs—**transparently handling multiple compression formats**: gzip, xz, zstd, and bzip2.
|
||||
|
||||
## Key Functionalities
|
||||
|
||||
- **`Ropen(f string)`**
|
||||
Opens a file, stdin (`"-"`), HTTP(S) URL, or shell command (e.g., `"|gzip -dc file.gz"`) for **buffered reading**, auto-detecting compression via magic bytes.
|
||||
|
||||
- **`Wopen(f string)` / `WopenFile(...)`**
|
||||
Opens a file or stdout (`"-"`) for **buffered writing**, automatically compressing output based on extension (`.gz`, `.xz`, `.zst`, `.bz2`).
|
||||
|
||||
- **Compression Detection**
|
||||
Functions like `IsGzip()`, `IsXz()`, `IsZst()`, and `IsBzip2()` inspect the first bytes of a buffered reader to infer format.
|
||||
|
||||
- **Path Utilities**
|
||||
- `ExpandUser(path)` expands POSIX-style paths (`~`, `~/path`) to absolute ones.
|
||||
- `Exists(path)` checks file existence after user expansion.
|
||||
|
||||
- **Error Handling**
|
||||
Defines semantic errors: `ErrNoContent`, `ErrDirNotSupported`.
|
||||
|
||||
- **Buffered IO**
|
||||
All readers/writers use a default buffer size of `65,536` bytes for performance.
|
||||
|
||||
- **Resource Management**
|
||||
`Close()` methods ensure proper cleanup of underlying readers/writers and compression streams.
|
||||
|
||||
## Supported Sources & Formats
|
||||
|
||||
| Source | Format(s) |
|
||||
|-------------------|------------------------|
|
||||
| Local files | plain, `.gz`, `.xz`, `.zst`, `.bz2` |
|
||||
| Stdin (`"-"`) | auto-detected |
|
||||
| HTTP(S) URLs | transparent decompression on stream read |
|
||||
| Pipe commands (`"|cmd"`) | output piped and auto-decompressed |
|
||||
|
||||
This abstraction simplifies bioinformatics or data-processing pipelines where input sources vary widely, and compression is common.
|
||||
@@ -0,0 +1,19 @@
|
||||
# `obiutils` Package: Semantic Overview
|
||||
|
||||
The `xopen.go` test suite (via GoCheck) validates utility functions for flexible file/stream I/O in Go. Key features:
|
||||
|
||||
- **`IsGzip()`**: Detects gzip compression by inspecting the first two bytes (`0x1f 0x8b`) of a `bufio.Reader`.
|
||||
- **`Ropen()`**: Unified reader opener supporting:
|
||||
- Local files (plain or `.gz`)
|
||||
- Standard input (`"-"`) — *note: currently unimplemented in tests*
|
||||
- HTTP(S) URLs (via `net/http`)
|
||||
- **`Wopen()`**: Unified writer opener for:
|
||||
- Local files (`".gz"` triggers gzip compression)
|
||||
- Standard output via `"-"`
|
||||
- **`Exists()`**: Checks file/directory existence (supports `~` expansion).
|
||||
- **`ExpandUser()`**: Expands shell-like paths (`~/...`) to absolute ones.
|
||||
- **Tested robustness**:
|
||||
- Handles missing files, invalid URLs (404), and malformed paths.
|
||||
- Validates gzip detection accuracy on both plain and compressed data.
|
||||
|
||||
All operations abstract away compression/format details, enabling uniform read/write semantics across local files, pipes (commented out), and remote HTTP resources.
|
||||
Reference in New Issue
Block a user