mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,22 @@
|
||||
# BioSequence Attribute Management API
|
||||
|
||||
This Go package (`obiseq`) provides a rich set of methods for managing metadata and structural attributes associated with biological sequences (`BioSequence`). Below is a semantic overview of the core functionalities:
|
||||
|
||||
- **Key Discovery & Existence Checks**:
|
||||
- `Keys()` and `AttributeKeys()` return all attribute names (optionally excluding container/statistics fields or the `"definition"` key).
|
||||
- `HasAttribute(key)` verifies presence of a given attribute (including standard fields: `"id"`, `"sequence"`, `"qualities"`).
|
||||
|
||||
- **Generic Attribute Access**:
|
||||
- `GetAttribute(key)` retrieves any attribute value (as `interface{}`), with thread-safe locking.
|
||||
- `SetAttribute(key, value)` assigns values to attributes (including automatic conversion for `"id"`, `"sequence"` and `"qualities"`).
|
||||
|
||||
- **Typed Attribute Retrieval**:
|
||||
- Type-specific getters (`GetIntAttribute`, `GetFloatAttribute`, `GetStringAttribute`, etc.) ensure safe conversion and *auto-upgrade* of stored values (e.g., string `"42"` → integer `42`).
|
||||
- Supports maps (`GetIntMap`, `GetStringMap`) and slices (`GetIntSlice`).
|
||||
|
||||
- **Convenience & Domain-Specific Helpers**:
|
||||
- `Count()` / `SetCount()`: manage observation frequency (default = 1).
|
||||
- OBITag indexing: `OBITagRefIndex()` / `SetOBITagRefIndex()`, and geometry variants (`geomref`). Supports flexible input map types with dynamic conversion.
|
||||
- Coordinate & landmark support: `GetCoordinate()` / `SetCoordinate()`, and `landmark_id`-based operations (`IsALandmark()`, `GetLandmarkID()`).
|
||||
|
||||
All methods are designed for robustness: they handle type conversions gracefully, use locking to ensure concurrency safety, and provide fallbacks (e.g., default count = 1). The API abstracts internal storage (`annotations` map) while exposing a clean, consistent interface for sequence annotation manipulation.
|
||||
@@ -0,0 +1,41 @@
|
||||
# BioSequence: A High-Performance Biological Sequence Representation
|
||||
|
||||
The `obiseq` package defines the `BioSequence` struct, a memory-efficient and thread-safe container for biological DNA sequences. Beyond raw sequence data (`[]byte`), it supports rich metadata and operations essential for NGS pipelines.
|
||||
|
||||
## Core Features
|
||||
|
||||
- **Metadata Fields**:
|
||||
- `id`: Unique sequence identifier.
|
||||
- `source`: Filename (without path/extension) of origin.
|
||||
- `definition`: Optional descriptive text, stored in annotations.
|
||||
|
||||
- **Sequence & Quality Support**:
|
||||
- Stores sequence as lowercase `[]byte` (normalized via in-place lowercasing).
|
||||
- Quality scores (`Quality = []uint8`) with fallback to default Phred+40 values when missing.
|
||||
- Methods for incremental writing (`Write`, `WriteByte`) and clearing.
|
||||
|
||||
- **Annotations & Features**:
|
||||
- Generic `Annotation` map (`map[string]interface{}`) for flexible metadata.
|
||||
- Thread-safe access via `annot_lock` mutex (explicit locking/unlocking methods).
|
||||
- Raw feature table storage (`[]byte`, e.g., EMBL/GenBank features).
|
||||
|
||||
- **Biological Relationships**:
|
||||
- `paired`: Pointer to mate/read-pair sequence.
|
||||
- `revcomp`: Pointer to reverse-complement variant (lazy or precomputed).
|
||||
|
||||
- **Introspection & Utility**:
|
||||
- `Len()`, `HasSequence()`, `Composition()` (nucleotide counts: a,c,g,t,o).
|
||||
- MD5 checksums (`MD5()` and `MD5String()`) for deduplication.
|
||||
- Memory footprint estimation (`MemorySize()`), critical for streaming/batching.
|
||||
|
||||
- **Efficiency Optimizations**:
|
||||
- `NewBioSequenceOwning`/`TakeQualities`: Zero-copy slice adoption (caller must not reuse input).
|
||||
- `Recycle()`: Reuses slices via pool-aware functions (`RecycleSlice`, etc.).
|
||||
- Global counters track creation/destruction/in-memory sequences for diagnostics.
|
||||
|
||||
- **Safety & Compatibility**:
|
||||
- Copy semantics via `Copy()` (deep copy of slices + annotations).
|
||||
- Validation: `HasValidSequence` enforces allowed characters (`a-z`, `-`, `.`, `[`, `]`).
|
||||
- Uses unsafe string conversion for quality ASCII output (Phred shift configurable via `obidefault`).
|
||||
|
||||
Designed for scalability in large-scale metabarcoding workflows (e.g., OBITools4), balancing performance, correctness, and extensibility.
|
||||
@@ -0,0 +1,35 @@
|
||||
# `obiseq` Package: Semantic Overview
|
||||
|
||||
The `obiseq` package provides a robust, thread-safe implementation of biological sequence objects in Go. It defines the core `BioSequence` type and associated utilities for handling nucleotide sequences (DNA/RNA), quality scores, annotations, features, memory management, and metadata operations.
|
||||
|
||||
### Core Functionalities
|
||||
|
||||
- **Construction & Initialization**
|
||||
- `NewEmptyBioSequence(cap)` creates an empty sequence with optional preallocated capacity.
|
||||
- `NewBioSequence(id, seq, def)` builds a basic sequence with ID (case-normalized), byte-level sequence (`[]byte`), and definition.
|
||||
- `NewBioSequenceWithQualities(...)` extends the above with per-base quality scores (`[]byte` or `Quality`).
|
||||
|
||||
- **Accessors & Properties**
|
||||
- `Id()`, `Definition()` return metadata fields.
|
||||
- `Sequence()` returns the normalized (lowercase) sequence as a copy of internal bytes.
|
||||
- `Len()` returns the length (number of bases).
|
||||
- `String()` provides a human-readable sequence string.
|
||||
|
||||
- **Quality & Feature Support**
|
||||
- `HasQualities()` checks if quality scores are present.
|
||||
- `Qualities()`, `SetQualities(...)` manage per-base quality data (with fallback to default values).
|
||||
- `Features()` retrieves optional feature annotations as a string.
|
||||
|
||||
- **Annotation System**
|
||||
- `Annotations()`, `HasAnnotation()` allow inspection of arbitrary metadata (key-value map).
|
||||
- Thread-safe via internal `sync.Mutex`, exposed through `AnnotationsLock()`.
|
||||
|
||||
- **Utility & Safety**
|
||||
- `Recycle()` safely resets internal slices and annotations (enables object pooling). Handles nil receivers gracefully.
|
||||
- `Copy()` performs deep copy of all fields, including annotations and locks (new mutex).
|
||||
- `MD5()` computes the MD5 hash of the sequence bytes.
|
||||
|
||||
- **Analysis Methods**
|
||||
- `Composition()` returns a nucleotide count map (`a`, `c`, `g`, `t`, and `'o'` for others), case-insensitive.
|
||||
|
||||
All operations are designed with performance, safety (nil-safety, copy semantics), and extensibility in mind—ideal for bioinformatics pipelines requiring immutable or pooled sequence handling.
|
||||
@@ -0,0 +1,37 @@
|
||||
# `obiseq` Package: BioSequence Collection Management
|
||||
|
||||
The `obiseq` package provides a high-performance, memory-efficient implementation for managing collections of biological sequences (`BioSequence`) in Go. Its core type is `BioSequenceSlice`, a slice of pointers to `BioSequence` objects, optimized for batch processing in metagenomic pipelines.
|
||||
|
||||
### Key Functionalities
|
||||
|
||||
- **Memory Pooling & Allocation Control**:
|
||||
`NewBioSequenceSlice` and `MakeBioSequenceSlice` allow creating slices with optional capacity hints.
|
||||
`EnsureCapacity(capacity)` dynamically grows the underlying slice while logging warnings or panicking on persistent allocation failures.
|
||||
|
||||
- **Efficient Element Management**:
|
||||
- `Push(sequence)`: Appends a sequence to the end.
|
||||
- `Pop()`: Removes and returns the last element (nil-safe).
|
||||
- `Pop0()`: Efficiently removes and returns the first element.
|
||||
|
||||
- **Collection Metadata Queries**:
|
||||
- `Len()`: Returns number of sequences in the slice.
|
||||
- `Size()`: Computes total sequence length (summing all `.Len()`).
|
||||
- `NotEmpty()`: Boolean check for non-empty collections.
|
||||
|
||||
- **Attribute Aggregation**:
|
||||
`AttributeKeys(skip_map, skip_definition)` aggregates all attribute keys across sequences into a set—useful for schema inference or validation.
|
||||
|
||||
- **Sorting Capabilities**:
|
||||
- `SortOnCount(reverse)`: Sorts by read count (descending/ascending).
|
||||
- `SortOnLength(reverse)`: Sorts by sequence length.
|
||||
|
||||
- **Taxonomy Integration**:
|
||||
`ExtractTaxonomy(taxonomy, seqAsTaxa)` builds or extends a taxonomic tree from sequence paths.
|
||||
When `seqAsTaxa=true`, it injects pseudo-taxonomic labels for individual sequences (e.g., `OTU:SEQ0000012345 [seqID]@sequence`), enabling unified taxonomic/rarefaction workflows.
|
||||
|
||||
### Design Highlights
|
||||
|
||||
- Minimal allocations via manual slice management and `slices.Grow`.
|
||||
- Explicit niling of popped elements to aid garbage collection.
|
||||
- Integrated logging (via `logrus`) for allocation issues—critical in large-scale NGS data processing.
|
||||
- Designed to support `BioSequenceBatch`, a higher-level abstraction for streaming or parallelizable sequence batches.
|
||||
@@ -0,0 +1,32 @@
|
||||
# BioSequence Classifier Module Overview
|
||||
|
||||
This Go package (`obiseq`) provides a flexible and thread-safe framework for classifying biological sequences using different strategies. Each classifier implements four core methods:
|
||||
- `Code(sequence) int`: assigns an integer class to a sequence.
|
||||
- `Value(k) string`: retrieves the original value (or representation) for class index *k*.
|
||||
- `Reset()`: clears internal state.
|
||||
- `Clone() *BioSequenceClassifier`: creates a fresh copy of the classifier.
|
||||
|
||||
## Supported Classifier Types
|
||||
|
||||
1. **`AnnotationClassifier(key, na)`**
|
||||
Classifies sequences based on a single annotation field. Missing annotations default to `na`. Internally maps string values → integer codes via a thread-safe dictionary.
|
||||
|
||||
2. **`DualAnnotationClassifier(key1, key2, na)`**
|
||||
Uses *two* annotation fields. Combines them (as JSON array) to form unique class identifiers, enabling multi-dimensional classification.
|
||||
|
||||
3. **`PredicateClassifier(predicate)`**
|
||||
Binary classifier: returns `1` if the provided predicate function evaluates to true, else `0`. Useful for rule-based grouping (e.g., length > 200).
|
||||
|
||||
4. **`HashClassifier(size)`**
|
||||
Assigns sequences to one of `size` buckets via CRC32 hash of the raw sequence. Deterministic and memory-efficient, but may cause collisions.
|
||||
|
||||
5. **`SequenceClassifier()`**
|
||||
Unique class per *exact* sequence string (case-sensitive). Uses a lock-protected map to deduplicate and index sequences.
|
||||
|
||||
6. **`RotateClassifier(size)`**
|
||||
Cyclic assignment: sequence *i* → class `i mod size`. No memoization; state resets only manually.
|
||||
|
||||
7. **`CompositeClassifier(...)`**
|
||||
Combines multiple classifiers: concatenates their integer outputs (e.g., `"3:17:0"`) to form a composite class key. Enables layered or hierarchical classification.
|
||||
|
||||
All classifiers are immutable after creation (state is internal and synchronized), supporting concurrent use in pipelines.
|
||||
@@ -0,0 +1,20 @@
|
||||
# Semantic Description of `obiseq` Comparison Functions
|
||||
|
||||
The `obiseq` package provides utility functions for comparing biological sequence records (`*BioSequence`) based on different fields. These comparators are designed to support sorting, deduplication, or grouping operations in bioinformatics workflows.
|
||||
|
||||
- **`CompareSequence(a, b *BioSequence) int`**
|
||||
Compares the raw nucleotide or amino acid sequences (`a.sequence`) lexicographically using `bytes.Compare`. Returns:
|
||||
- `<0` if `a < b`,
|
||||
- `0` if equal,
|
||||
- `>0` if `a > b`.
|
||||
|
||||
- **`CompareQuality(a, b *BioSequence) int`**
|
||||
Compares the base quality scores (`a.qualities`) lexicographically (as byte strings), following same semantics as above. Useful for sorting reads by quality profiles.
|
||||
|
||||
- **Commented-out `CompareAttributeBuilder(key string)`**
|
||||
A planned higher-order function to generate custom comparators based on sequence attributes (e.g., `RG`, `NM`). It would:
|
||||
- Extract attribute values using `.GetAttribute(key)`.
|
||||
- Handle missing attributes (treat absent as "less than" present).
|
||||
- Eventually support typed comparisons for ordered types.
|
||||
|
||||
These functions assume `BioSequence` implements a consistent internal structure with `.sequence []byte` and `.qualities []byte`. They enable flexible, field-based ordering in collections of sequencing records.
|
||||
@@ -0,0 +1,28 @@
|
||||
# Semantic Description of `obiseq` Expression-Based Workers
|
||||
|
||||
This module provides **expression-driven transformation workers** for biological sequence objects (`BioSequence`). It leverages a custom expression language (via `OBILang`) to dynamically compute values based on sequence metadata and content.
|
||||
|
||||
## Core Components
|
||||
|
||||
- **`Expression(expression string)`**:
|
||||
Returns a function that evaluates the given expression in context. The evaluation scope includes:
|
||||
- `annotations`: sequence annotations (metadata).
|
||||
- `sequence`: the full `BioSequence` object itself.
|
||||
|
||||
- **`EditIdWorker(expression string)`**:
|
||||
A sequence worker that updates the *ID* of a `BioSequence` by evaluating the expression.
|
||||
- On success: sets `sequence.Id()` to string representation of result.
|
||||
- On failure: logs and returns an error with context.
|
||||
|
||||
- **`EditAttributeWorker(key string, expression string)`**:
|
||||
A sequence worker that sets a *custom attribute* (identified by `key`) on the sequence, using evaluated expression result.
|
||||
- Supports arbitrary metadata enrichment.
|
||||
- Errors are reported with sequence ID and failed expression.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Generate new IDs from annotation fields (e.g., `"gene_" + annotations["locus_tag"]`).
|
||||
- Compute and store derived attributes (e.g., GC content, ORF length) as sequence metadata.
|
||||
- Apply conditional logic or transformations across large sets of sequences in pipelines.
|
||||
|
||||
All workers conform to the `SeqWorker` interface, enabling composition and chaining.
|
||||
@@ -0,0 +1,27 @@
|
||||
# Semantic Description of `obiseq` Package
|
||||
|
||||
The `obiseq` package provides utilities for handling **IUPAC nucleotide ambiguity codes** in biological sequences.
|
||||
|
||||
## Core Components
|
||||
|
||||
- `_iupac`: A lookup table mapping lowercase ASCII letters (`a`–`z`) to numeric IUPAC nucleotide codes:
|
||||
- `A=1`, `C=2`, `G=4`, `T/U=8` (standard bases)
|
||||
- Ambiguous codes are bitwise OR combinations:
|
||||
e.g., `R = A|G = 1+4=5`, `Y = C|T = 2+8=10`, etc.
|
||||
- Invalid or non-nucleotide characters map to `0`.
|
||||
|
||||
## Key Functionality
|
||||
|
||||
### `SameIUPACNuc(a, b byte) bool`
|
||||
Performs **case-insensitive comparison** of two nucleotide symbols using IUPAC ambiguity rules.
|
||||
|
||||
- Converts uppercase letters to lowercase via bitwise OR (`|= 32`).
|
||||
- For valid nucleotides, checks if their IUPAC codes have **non-zero bitwise AND**:
|
||||
- Returns `true` only if the symbols share at least one possible base.
|
||||
*Example*: `'R' & 'A' → (5 & 1) = 1 > 0 ⇒ true`
|
||||
`'Y' & 'G' → (10 & 4) = 0 ⇒ false`
|
||||
- For non-IUPAC or invalid characters, falls back to exact equality (`a == b`).
|
||||
|
||||
## Use Case
|
||||
|
||||
Enables robust comparison of DNA/RNA sequences where ambiguity codes (e.g., `N`, `R`, `W`) are used—critical for alignment, variant calling, or primer design tools.
|
||||
@@ -0,0 +1,35 @@
|
||||
# `obiseq` Package: Sequence Concatenation via `.Join()`
|
||||
|
||||
The `BioSequence.Join()` method enables semantic concatenation of two biological sequences (e.g., DNA, RNA, or protein strings).
|
||||
|
||||
- **Signature**:
|
||||
```go
|
||||
func (sequence *BioSequence) Join(seq2 *BioSequence, inplace bool) *BioSequence
|
||||
```
|
||||
|
||||
- **Purpose**:
|
||||
Combines the current sequence (`sequence`) with a second one (`seq2`), returning a new or modified `BioSequence`.
|
||||
|
||||
- **Parameters**:
|
||||
- `seq2`: The sequence to append. Must be a valid `*BioSequence`.
|
||||
- `inplace`: Boolean flag: if `true`, modifies the receiver in-place; otherwise, operates on a copy.
|
||||
|
||||
- **Semantics**:
|
||||
- If `inplace == false`, the method first creates a deep copy of the original sequence to avoid side effects.
|
||||
- It then appends `seq2.Sequence()` (the underlying string/byte representation) to the target sequence using an internal `.Write()` method.
|
||||
- The final concatenated result is returned as a `*BioSequence`.
|
||||
|
||||
- **Behavioral Guarantees**:
|
||||
- *Pure operation*: When `inplace = false`, the original sequences remain unaltered.
|
||||
- *Chaining-friendly*: Returns a pointer, enabling method chaining (e.g., `seq.Join(a, false).Join(b, true)`).
|
||||
|
||||
- **Use Cases**:
|
||||
- Building multi-domain proteins or gene fusions.
|
||||
- Merging fragments from sequencing reads.
|
||||
- Constructing synthetic constructs in silico.
|
||||
|
||||
- **Assumptions**:
|
||||
- `BioSequence.Sequence()` returns a valid string/byte slice.
|
||||
- `.Write(...)` handles appending correctly (e.g., no validation of biological compatibility — e.g., frame shifts are not checked).
|
||||
|
||||
This method supports flexible, functional-style sequence manipulation while preserving memory safety via optional in-place mutation.
|
||||
@@ -0,0 +1,20 @@
|
||||
## BioSequence.Kmers(k int) — Semantic Description
|
||||
|
||||
The `Kmers` method is a generator function that yields all contiguous *k*-length subsequences (called **k-mers**) from a biological sequence (`BioSequence`).
|
||||
|
||||
- It operates on `[]byte` data, assuming the underlying sequence is stored as a byte slice (e.g., DNA bases `A`, `C`, `G`, `T`).
|
||||
- Uses Go’s new iterator protocol (`iter.Seq[[]byte]`) for memory-efficient, lazy evaluation.
|
||||
- Validates input: returns an empty iterator if `k ≤ 0` or exceeds sequence length.
|
||||
- Iterates linearly from index `i = 0` to `len(seq) - k`, extracting slices of length *k*.
|
||||
- Each yielded value is a **non-copying slice view** (efficient, but mutable if original data changes).
|
||||
- Supports early termination: the consumer can stop iteration by returning `false` from the yield callback.
|
||||
- Designed for downstream tasks like sequence analysis, motif discovery, or hashing (e.g., in k-mer counting).
|
||||
- Does *not* handle reverse-complement or ambiguous bases—assumes raw sequence input.
|
||||
|
||||
Usage example:
|
||||
```go
|
||||
for kmer := range seq.Kmers(3) {
|
||||
fmt.Printf("%s\n", string(kmer))
|
||||
}
|
||||
```
|
||||
This yields all 3-mers (e.g., `"ACG"`, `"CGT"`...) in order.
|
||||
@@ -0,0 +1,41 @@
|
||||
# Semantic Description of `obiseq` Language Extensions
|
||||
|
||||
The `package obiseq` extends the [Gval](https://github.com/PaesslerAG/gval) expression language with domain-specific functions tailored for bioinformatics and data processing. It integrates utility helpers from `obiutils` to provide type-flexible, robust operations over sequences and collections.
|
||||
|
||||
## Core Functionalities
|
||||
|
||||
- **Data Inspection**:
|
||||
`len`, `ismap`, `isvector` — retrieve size and type information.
|
||||
|
||||
- **Aggregation & Comparison**:
|
||||
`min`, `max` — compute extremal values in slices/maps (via `obiutils.Min/Max`).
|
||||
*(Note: commented-out helper functions suggest prior attempts at manual implementations.)*
|
||||
|
||||
- **Type Conversion**:
|
||||
`int`, `numeric` (→ float64), `bool`, `string` — safely coerce arbitrary inputs to target types; fail with fatal logs on invalid data.
|
||||
|
||||
- **String Manipulation**:
|
||||
`sprintf`, `subspc` (replace spaces with underscores), `replace` (regex-based substitution), and `substr` — support formatting, normalization, and slicing.
|
||||
|
||||
- **Sequence Analysis (Bioinformatics)**:
|
||||
`gc`, `gcskew`, and `composition` — compute nucleotide composition metrics for DNA/RNA sequences (`BioSequence`).
|
||||
- `gc`: GC content ratio (excluding ambiguous bases `'o'`)
|
||||
- `gcskew`: `(G−C)/(G+C)` asymmetry measure
|
||||
- `composition`: returns a map of base counts (e.g., `"a":20.0`, `"g":15.0`)
|
||||
|
||||
- **Element Access**:
|
||||
`elementof(seq, idx)` — retrieves item at index/key for slices (`[]interface{}`), maps (`map[string]interface{}`), or strings (by byte position).
|
||||
|
||||
- **Control Flow**:
|
||||
`ifelse(cond, then_val, else_val)` — conditional branching within expressions.
|
||||
|
||||
- **Quality Support**:
|
||||
`qualities(seq)` — extracts per-base quality scores as a float slice from sequencing reads.
|
||||
|
||||
## Design Principles
|
||||
|
||||
- **Dynamic Typing**: Accepts `...interface{}` arguments for flexibility.
|
||||
- **Error Handling**: Uses fatal logging (`log.Fatalf`) on conversion failures; returns typed errors for runtime issues.
|
||||
- **Extensibility**: Built atop `gval.Language`, enabling custom expression evaluation in pipelines (e.g., filtering reads via GC thresholds).
|
||||
|
||||
This package serves as a bridge between high-level scripting and low-level biosequence computation, ideal for rule-based filtering or annotation in NGS workflows.
|
||||
@@ -0,0 +1,39 @@
|
||||
# Semantic Description of `obiseq` Statistics and Merging Features
|
||||
|
||||
This package provides infrastructure for **tracking, aggregating, and merging statistical occurrences** of sequence attributes across biological sequences (`BioSequence`). It supports both **count-based and weighted statistics**, with thread-safe operations.
|
||||
|
||||
## Core Components
|
||||
|
||||
- `StatsOnValues`: A concurrent map (`map[string]int`) with R/W locking to store occurrence counts per attribute value (e.g., taxon, primer, quality bin).
|
||||
- `StatsOnDescription`: Defines *how* to extract and weight statistics from a sequence (e.g., count per read, or sum of quality scores).
|
||||
- `StatsOnSlotName(key)`: Generates internal annotation keys (e.g., `"merged_taxon"`) to store precomputed statistics.
|
||||
|
||||
## Key Functionalities
|
||||
|
||||
1. **Per-Sequence Statistics Initialization & Update**
|
||||
- `StatsOn(desc, na)`: Ensures a statistics slot exists for attribute `desc.Key`, initializes if needed.
|
||||
- `StatsPlusOne(...)`: Adds contribution of a *single* sequence to the statistics (e.g., increment count for its taxon).
|
||||
|
||||
2. **Thread-Safe Aggregation**
|
||||
- `Merge(*StatsOnValues)`: Safely merges counts from another `StatsOnValues`, used to combine per-sequence stats.
|
||||
|
||||
3. **Sequence Merging with Stat Propagation**
|
||||
- `BioSequence.Merge(...)`:
|
||||
- Combines two sequences (e.g., consensus/overlap).
|
||||
- Updates statistics for specified attributes (`statsOn`), preserving or aggregating counts.
|
||||
- Resolves conflicting annotations by deleting non-merged fields if mismatched.
|
||||
|
||||
4. **Bulk Merging**
|
||||
- `BioSequenceSlice.Merge(...)`: Efficiently merges *N* sequences into one, recycling inputs and updating statistics incrementally.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Tracking taxonomic assignments across merged reads.
|
||||
- Aggregating primer or barcode counts in amplicon merging.
|
||||
- Summarizing quality scores, abundance weights, or custom metadata during consensus building.
|
||||
|
||||
## Design Notes
|
||||
|
||||
- Uses `sync.RWMutex` for safe concurrent access.
|
||||
- Supports only JSON-marshalable, serializable statistics (via `MarshalJSON`).
|
||||
- Enforces type safety: only strings/integers/booleans allowed for attribute values.
|
||||
@@ -0,0 +1,19 @@
|
||||
# BioSequence Pairing Functionality
|
||||
|
||||
This package provides semantic tools for managing biological sequence pairings—typically used in genomics (e.g., paired-end reads). Key features:
|
||||
|
||||
- **Single-sequence pairing**:
|
||||
- `IsPaired()` checks if a sequence is currently paired.
|
||||
- `PairedWith()` returns the linked partner, or `nil`.
|
||||
- `PairTo(p)` establishes a bidirectional link between two sequences.
|
||||
- `UnPair()` safely severs the pairing on both ends.
|
||||
|
||||
- **Batch (slice) handling**:
|
||||
- `IsPaired()` and `UnPair()` operate uniformly across all sequences in a slice.
|
||||
- `PairedWith()` returns the corresponding paired slice (element-wise).
|
||||
- `PairTo(p)` enforces length compatibility and pairs sequences index-by-index.
|
||||
|
||||
- **Error handling**:
|
||||
- Mismatched slice lengths during `PairTo` trigger a fatal log (via Logrus), preventing inconsistent pairings.
|
||||
|
||||
Semantically, the API supports both *atomic* and *bulk* pairing operations while preserving consistency through bidirectional references—ideal for processing paired-end sequencing data.
|
||||
@@ -0,0 +1,34 @@
|
||||
# Semantic Overview of `obiseq` Package Functionalities
|
||||
|
||||
This Go package (`obiseq`) provides memory-efficient utilities for managing slices and annotations—key data structures in biosequence processing.
|
||||
|
||||
## Slice Management
|
||||
|
||||
- **`GetSlice(capacity int) []byte`**
|
||||
Retrieves a reusable `[]byte` with ≥ requested capacity. For capacities ≤1024 bytes, it pulls from a `sync.Pool` (`_BioSequenceByteSlicePool`). Larger slices are freshly allocated.
|
||||
|
||||
- **`RecycleSlice(s *[]byte)`**
|
||||
Clears and recycles small slices (≤1024 bytes) back to the pool. For large slices (≥100 KB), it nils them and triggers explicit `runtime.GC()` every ~256 MB of discarded memory to prevent heap bloat.
|
||||
|
||||
- **`CopySlice(src []byte) []byte`**
|
||||
Efficiently copies a source slice into a pooled or newly allocated destination, preserving semantics without unnecessary allocations.
|
||||
|
||||
## Annotation Management
|
||||
|
||||
- **`BioSequenceAnnotationPool`**
|
||||
A `sync.Pool` for reusable map-based annotations (`map[string]string`, inferred from usage), initialized with capacity 1.
|
||||
|
||||
- **`GetAnnotation(values ...Annotation) Annotation`**
|
||||
Fetches an annotation map from the pool, optionally pre-populated via shallow copy of input annotations using `obiutils.MustFillMap`.
|
||||
|
||||
- **`RecycleAnnotation(a *Annotation)`**
|
||||
Clears all keys from an annotation map and returns it to the pool for reuse.
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The package prioritizes low-latency, high-throughput scenarios (e.g., NGS data pipelines) by minimizing GC pressure via:
|
||||
- Tiered pooling strategy (`small` vs `large`)
|
||||
- Explicit garbage collection triggers for large-object churn
|
||||
- Safe reuse patterns avoiding aliasing or stale references
|
||||
|
||||
All operations are thread-safe via `sync.Pool` and atomic counters.
|
||||
@@ -0,0 +1,33 @@
|
||||
# Sequence Predicate Framework in `obiseq`
|
||||
|
||||
This Go package provides a flexible and composable predicate system for filtering biological sequences (`BioSequence`) based on diverse criteria.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
- **`SequencePredicate`**: A function type `func(*BioSequence) bool`, enabling conditional logic on sequences.
|
||||
- **Predicate Composition**: Supports logical operations (`And`, `Or`, `Xor`, `Not`) and chaining.
|
||||
- **Paired-end Support**: Predicates can be adapted to consider read pairs via `PredicateOnPaired` and `PairedPredicat`, with modes:
|
||||
- `ForwardOnly`: Only the forward read is evaluated.
|
||||
- `ReverseOnly`, `And`, `Or`, `AndNot`, `Xor`: Combine forward and reverse evaluations.
|
||||
|
||||
## Built-in Predicates
|
||||
|
||||
| Predicate | Description |
|
||||
|-----------|-------------|
|
||||
| `HasAttribute(name)` | Checks if a sequence has an annotation with the given name. |
|
||||
| `IsAttributeMatch(name, pattern)` | Tests if a named annotation matches the provided regex (case-sensitive). |
|
||||
| `IsMoreAbundantOrEqualTo(count)` / `IsLessAbundantOrEqualTo(count)` | Filters by sequence abundance (count field). |
|
||||
| `IsLongerOrEqualTo(length)` / `IsShorterOrEqualTo(length)` | Filters by sequence length. |
|
||||
| `OccurInAtleast(sample, n)` | Checks if the sequence appears in at least *n* samples (via description stats). |
|
||||
| `IsSequenceMatch(pattern)` | Matches the raw sequence against a regex (case-insensitive). |
|
||||
| `IsDefinitionMatch(pattern)` | Matches the definition/description line against a regex. |
|
||||
| `IsIdMatch(pattern)` / `IsIdIn(ids...)` | Filters by sequence ID using regex or explicit set. |
|
||||
| `ExpressionPredicat(expression)` | Evaluates a custom boolean expression (via OBILang) using annotations and sequence metadata. |
|
||||
|
||||
## Design Highlights
|
||||
|
||||
- **Null-safe**: `nil` predicates are handled gracefully in compositions.
|
||||
- **Extensible**: Custom predicates can be defined and combined seamlessly.
|
||||
- **Logging & Safety**: Invalid regex patterns or expression syntax trigger fatal errors; runtime evaluation issues emit warnings.
|
||||
|
||||
This framework enables powerful, declarative filtering pipelines for high-throughput sequencing data analysis.
|
||||
@@ -0,0 +1,35 @@
|
||||
# BioSequence Reverse Complement Functionality
|
||||
|
||||
This Go package (`obiseq`) provides utilities for computing the reverse complement of biological sequences (e.g., DNA), including support for quality scores and structured metadata.
|
||||
|
||||
## Core Functions
|
||||
|
||||
- **`nucComplement(n byte) byte`**
|
||||
Returns the nucleotide complement using a lookup table (`_revcmpDNA`). Handles special cases:
|
||||
- `.` / `-` → unchanged (gaps)
|
||||
- `[`, `]` → swapped (`[` ↔ `]`)
|
||||
- A–Z letters → complemented (case-insensitive via bitwise masking)
|
||||
- Unknown characters → `'n'`
|
||||
|
||||
- **`BioSequence.ReverseComplement(inplace bool) *BioSequence`**
|
||||
Performs reverse complement on the sequence and (if present) its quality string:
|
||||
- If `inplace = false`, a copy is made; original preserved.
|
||||
- Reverses indices and complements each base using `nucComplement`.
|
||||
- Also reverses the quality array symmetrically.
|
||||
- Caches result in `sequence.revcomp` for reuse.
|
||||
|
||||
- **`BioSequence._revcmpMutation() *BioSequence`**
|
||||
Adjusts mutation metadata (e.g., `"pairing_mismatches"`) to reflect the reversed-complement orientation:
|
||||
- Reverses and complements symbolic mutation strings (e.g., `"A>T"` → `"T>A"`).
|
||||
- Updates positional indices to match reversed sequence coordinates.
|
||||
|
||||
- **`ReverseComplementWorker(inplace bool) SeqWorker`**
|
||||
Returns a reusable `SeqWorker` function for batch processing: applies reverse complement to each sequence in a stream.
|
||||
|
||||
## Design Notes
|
||||
|
||||
- Uses ASCII bitwise tricks (`&31`, `|0x20`) for case-insensitive indexing and lowercase output.
|
||||
- Supports non-standard symbols (e.g., IUPAC ambiguity codes via lookup table).
|
||||
- Integrates quality scores and structured attributes seamlessly.
|
||||
|
||||
> Ideal for NGS preprocessing pipelines where orientation matters (e.g., paired-end alignment, variant calling).
|
||||
@@ -0,0 +1,19 @@
|
||||
## Semantic Description of `obiseq` Package Functionality
|
||||
|
||||
The `obiseq` package provides core bioinformatics utilities for nucleic acid sequence manipulation in Go. It centers around two key operations:
|
||||
|
||||
- **Nucleotide Complementation (`nucComplement`)**
|
||||
Implements standard Watson-Crick base pairing rules: `A↔T`, `C↔G`. It also handles ambiguous or symbolic characters (e.g., `'n' → 'n'`, `'[ ↔ ]'`), preserving non-standard symbols like gaps (`'-'`) and missing data (`'.'`). This function serves as the atomic building block for reverse-complement logic.
|
||||
|
||||
- **Reverse Complementation (`BioSequence.ReverseComplement`)**
|
||||
A method on the `BioSequence` type that returns a new (or in-place modified) sequence representing:
|
||||
- The *reverse* of the original nucleotide string, followed by
|
||||
- Each base replaced with its complement (via `nucComplement`).
|
||||
|
||||
The method supports two modes:
|
||||
- **Non-destructive (`inplace=false`)**: Returns a new `BioSequence`, leaving the original unchanged.
|
||||
- **In-place (`inplace=true`)**: Modifies and returns the same object for memory efficiency.
|
||||
|
||||
Crucially, it preserves associated quality scores (e.g., Phred-scaled sequencing qualities), reversing their order to match the reversed sequence—ensuring correctness in downstream analyses like alignment or variant calling.
|
||||
|
||||
Tests validate both functions across edge cases: degenerate bases, ambiguous symbols, and quality-aware sequences—confirming robustness for typical NGS (Next-Generation Sequencing) workflows.
|
||||
@@ -0,0 +1,13 @@
|
||||
# `obiseq.Subsequence` Functionality Overview
|
||||
|
||||
The `Subsequence()` method extracts a contiguous segment from a biological sequence (`BioSequence`), supporting both linear and circular topologies.
|
||||
|
||||
- **Input validation**: Checks ensure `from < to` (unless circular), positions are non-negative, and bounds respect sequence length.
|
||||
- **Circular handling**: Positions exceeding the sequence length wrap around using modular arithmetic; debug logs record corrections.
|
||||
- **Linear extraction**: When `from < to`, it slices the underlying nucleotide/peptide sequence and, if present, its quality scores.
|
||||
- **Circular extraction**: When `from > to`, it concatenates two linear segments: from `from` → end, and start → `to`.
|
||||
- **Metadata preservation**: Quality scores (if available) and annotations are copied to the new subsequence.
|
||||
- **ID formatting**: The resulting sequence ID is suffixed with `[from..to]` (1-based indexing).
|
||||
- **Mutation tracking**: A private `_subseqMutation()` adjusts stored pairing mismatch positions by subtracting the extraction shift, ensuring coordinate consistency post-extraction.
|
||||
|
||||
This enables robust subsequence generation for genomic analysis workflows involving circular genomes (e.g., plasmids) or fragmented reads.
|
||||
@@ -0,0 +1,29 @@
|
||||
# `obiseq` Package: Subsequence Extraction Functionality
|
||||
|
||||
The `Subsequence()` method enables extraction of a contiguous segment from biological sequence data (`BioSequence`). It supports both linear and circular (wrapped) slicing.
|
||||
|
||||
- **Input Parameters**:
|
||||
- `from`, `to`: 0-based inclusive indices defining the slice range.
|
||||
- `circular`: boolean flag enabling wrap-around when `from > to`.
|
||||
|
||||
- **Behavior**:
|
||||
- For linear (`circular = false`), `from ≤ to`, and indices within bounds `[0, len(seq))`.
|
||||
- For circular (`circular = true`), allows wrap-around (e.g., `from=3, to=2` on a 4-mer yields indices `[3,0,1]`).
|
||||
- Validates inputs: returns descriptive errors for:
|
||||
- `from > to` (non-circular),
|
||||
- out-of-bounds indices (`< 0` or `≥ length`),
|
||||
- invalid ranges.
|
||||
|
||||
- **Quality Support**:
|
||||
- When sequence includes base quality scores (`BioSequenceWithQualities`), the method preserves corresponding sub-slice of `Quality[]`.
|
||||
|
||||
- **Return Value**:
|
||||
- Returns a new `BioSequence` (or subclass) instance containing the extracted subsequence and its optional qualities.
|
||||
|
||||
- **Use Case**:
|
||||
- Ideal for region-of-interest extraction (e.g., primer binding sites, domain segments), especially in circular genomes or plasmids.
|
||||
|
||||
- **Testing**:
|
||||
- Unit tests (`TestSubsequence`) cover valid/invalid inputs, circular/non-circular modes, and quality consistency.
|
||||
|
||||
This functionality provides robust, semantics-aware slicing for biosequence manipulation in Go.
|
||||
@@ -0,0 +1,26 @@
|
||||
# Taxonomic Classification via `TaxonomyClassifier`
|
||||
|
||||
The `obiseq` package provides a taxonomic classification mechanism through the `TaxonomyClassifier` function.
|
||||
|
||||
- **Purpose**: Constructs a reusable classifier for biological sequences based on taxonomic hierarchy.
|
||||
- **Inputs**:
|
||||
- `taxonomicRank`: Target rank (e.g., `"species"`, `"genus"`).
|
||||
- `taxonomy`: Reference taxonomy (`*obitax.Taxonomy`), with fallback via `.OrDefault(true)`.
|
||||
- `abortOnMissing`: Boolean flag to enforce strict taxon resolution.
|
||||
|
||||
- **Core Logic**:
|
||||
- For each sequence, retrieves its `Taxon`, then drills down to the requested rank using `.TaxonAtRank()`.
|
||||
- If `abortOnMissing` is true, exits on failure to resolve the taxon or rank.
|
||||
- Internally maps `*TaxNode`s to integer codes for efficient storage/comparison.
|
||||
|
||||
- **Returned Object (`BioSequenceClassifier`)**:
|
||||
- `Code(sequence) int`: Assigns a unique integer code to the taxonomic assignment of a sequence.
|
||||
- `Value(code) string`: Returns the scientific name corresponding to a code.
|
||||
- `Reset()`: Reinitializes internal mappings (useful for batch processing).
|
||||
- `Clone() *BioSequenceClassifier`: Creates a fresh, identical classifier instance.
|
||||
|
||||
- **Design Rationale**:
|
||||
- Uses integer codes to avoid repeated string operations and enable fast indexing (e.g., for counting).
|
||||
- Supports both strict (`abortOnMissing=true`) and lenient classification modes.
|
||||
|
||||
This design enables scalable, efficient taxonomic profiling of sequencing datasets.
|
||||
@@ -0,0 +1,22 @@
|
||||
# Taxonomic Analysis Functions in `obiseq` Package
|
||||
|
||||
This module provides tools for assigning taxonomic labels to biological sequences using a reference taxonomy.
|
||||
|
||||
- **`TaxonomicDistribution(taxonomy)`**:
|
||||
Returns a map from taxonomic nodes to read counts, based on `taxid` annotations in the sequence metadata. It validates taxids against the taxonomy and enforces strict handling of aliases.
|
||||
|
||||
- **`LCA(taxonomy, threshold)`**:
|
||||
Computes the *Lowest Common Ancestor* (LCA) of all taxonomic assignments for a sequence, weighted by their abundances.
|
||||
- Iteratively traverses upward from each taxon’s path in the taxonomy tree.
|
||||
- At each level, computes the relative weight (`rmax`) of the most frequent taxon.
|
||||
- Stops when `rmax < threshold`, returning:
|
||||
• the LCA taxon,
|
||||
• its confidence score (`rans`), and
|
||||
• total read count used.
|
||||
|
||||
- **`AddLCAWorker(...)`**:
|
||||
Creates a `SeqWorker` function to annotate sequences with LCA results:
|
||||
- Sets attributes like `<slot>_taxid`, `<slot>_name`, and `<slot>_error` (rounded to 3 decimals).
|
||||
- Automatically appends `_taxid` if missing in `slot_name`.
|
||||
|
||||
All functions integrate with the OBITools4 ecosystem, supporting robust taxonomic inference for metabarcoding workflows.
|
||||
@@ -0,0 +1,41 @@
|
||||
# Taxonomic Annotation Features in `obiseq` Package
|
||||
|
||||
This package provides semantic taxonomic annotation capabilities for biological sequences (`BioSequence`). It integrates with a taxonomy database to assign, retrieve, and manage taxonomic identifiers (taxids) and related metadata.
|
||||
|
||||
## Core Functions
|
||||
|
||||
- **`Taxid()`**: Retrieves the taxonomic ID as a string (e.g., `"12345"` or `"NA"`), supporting multiple internal representations (`string`, `int`, `float64`). Returns `"NA"` if no taxid is set.
|
||||
|
||||
- **`Taxon(taxonomy)`**: Returns the corresponding `*obitax.Taxon` object, or `nil` if taxid is `"NA"`.
|
||||
|
||||
- **`SetTaxid(taxid, rank...)`**: Assigns a taxonomic ID to the sequence. Validates against default taxonomy; handles aliases and errors based on configuration flags (`FailOnTaxonomy`, `UpdateTaxid`). Optionally stores taxid under a custom rank (e.g., `"genus_taxid"`).
|
||||
|
||||
- **`SetTaxon(taxon, rank...)`**: Assigns a `*obitax.Taxon` object directly; stores its string representation as taxid.
|
||||
|
||||
## Rank-Specific Annotation
|
||||
|
||||
- **`SetTaxonAtRank(taxonomy, rank)`**: Annotates the sequence with taxid and scientific name at a specified Linnaean rank (e.g., `"species"`, `"genus"`). Sets two attributes: `rank_taxid` and `rank_name`. Returns the taxon at that rank (or `nil`).
|
||||
|
||||
- **Convenience wrappers**:
|
||||
- `SetSpecies(...)`
|
||||
- `SetGenus(...)`
|
||||
- `SetFamily(...)`
|
||||
All delegate to `SetTaxonAtRank`.
|
||||
|
||||
## Taxonomic Path & Metadata
|
||||
|
||||
- **`SetPath(taxonomy)`**: Computes and stores the full taxonomic lineage (from root to species) as a string slice under attribute `"taxonomic_path"`.
|
||||
|
||||
- **`Path()`**: Retrieves the stored taxonomic path; recomputes it if missing and a default taxonomy exists.
|
||||
|
||||
- **`SetScientificName(taxonomy)`**: Stores the sequence’s species-level scientific name under `"scientific_name"`.
|
||||
|
||||
- **`SetTaxonomicRank(taxonomy)`**: Stores the taxon’s rank (e.g., `"species"`, `"genus"`) under `"taxonomic_rank"`.
|
||||
|
||||
## Error Handling & Configuration
|
||||
|
||||
- Uses `logrus` and custom logging (`obilog`) for warnings/errors.
|
||||
- Behavior on taxonomy mismatches (e.g., unknown taxid, alias) is configurable via `obidefault` settings.
|
||||
- Ensures type consistency: taxid must be string, int, or float; invalid types trigger fatal errors.
|
||||
|
||||
All methods are designed for seamless integration into bioinformatics pipelines, enabling robust taxonomic profiling of sequencing data.
|
||||
@@ -0,0 +1,20 @@
|
||||
# Semantic Description of `obiseq` Package Functionalities
|
||||
|
||||
This Go package provides **sequence filtering predicates** for biological sequences, integrated with taxonomic validation and hierarchy analysis.
|
||||
|
||||
- `IsAValidTaxon(taxonomy, ...bool) SequencePredicate`:
|
||||
Returns a predicate that checks whether a sequence has an associated valid taxon in the given taxonomy.
|
||||
Optionally supports *auto-correction* of outdated/incorrect `taxid` values to match the current taxonomy node.
|
||||
|
||||
- `IsSubCladeOf(taxonomy, parent) SequencePredicate`:
|
||||
Filters sequences whose taxonomic assignment is a descendant (sub-clade) of the specified `parent` taxon.
|
||||
|
||||
- `IsSubCladeOfSlot(taxonomy, key) SequencePredicate`:
|
||||
Enables filtering based on a *sequence attribute* (e.g., `"taxon"` or `"classification"`) that holds a taxonomic label.
|
||||
Validates the label against the taxonomy, then checks if the sequence’s assigned taxon falls under it.
|
||||
|
||||
- `HasRequiredRank(taxonomy, rank) SequencePredicate`:
|
||||
Ensures the sequence’s taxon is assigned at or below a specified rank (e.g., `"species"`, `"genus"`).
|
||||
Validates the requested `rank` against taxonomy’s rank list; exits on invalid input.
|
||||
|
||||
All predicates follow a functional, composable design pattern (`SequencePredicate = func(*BioSequence) bool`), enabling flexible pipeline construction (e.g., filtering, classification validation).
|
||||
@@ -0,0 +1,22 @@
|
||||
# Taxonomic Annotation Workers in `obiseq`
|
||||
|
||||
This Go package provides functional workers for annotating biological sequences with taxonomic information using a hierarchical taxonomy (e.g., from NCBI or UNITE). Each worker is implemented as a `SeqWorker`—a function that processes one sequence and returns an updated slice of sequences.
|
||||
|
||||
- **`MakeSetTaxonAtRankWorker(taxonomy, rank)`**:
|
||||
Assigns a taxonomic label at *a specific rank* (e.g., `"genus"`, `"family"`). Validates that the requested `rank` exists in the taxonomy before proceeding.
|
||||
|
||||
- **`MakeSetSpeciesWorker(taxonomy)`**:
|
||||
Annotates each sequence with its inferred species name using the provided taxonomy.
|
||||
|
||||
- **`MakeSetGenusWorker(taxonomy)`**:
|
||||
Adds genus-level taxonomic assignment to sequences.
|
||||
|
||||
- **`MakeSetFamilyWorker(taxonomy)`**:
|
||||
Adds family-level taxonomic assignment.
|
||||
|
||||
- **`MakeSetPathWorker(taxonomy)`**:
|
||||
Populates the full taxonomic path (e.g., `"Eukaryota;Metazoa;Chordata;..."`) for each sequence.
|
||||
|
||||
All workers rely on methods of `BioSequence` (e.g., `.SetSpecies()`, `.SetPath()`), which internally use the `obitax.Taxonomy` object to resolve taxonomic IDs or names. Errors are logged via `logrus`; invalid ranks cause a fatal exit.
|
||||
|
||||
These utilities support modular, pipeline-friendly taxonomic annotation—ideal for high-throughput metabarcoding workflows.
|
||||
@@ -0,0 +1,18 @@
|
||||
# Semantic Description of `obiseq` Package Functionalities
|
||||
|
||||
The `obiseq` package provides composable, higher-order worker functions for processing biological sequence data in Go. It defines three core functional types:
|
||||
|
||||
- `SeqAnnotator`: In-place annotation of a single sequence (e.g., adding metadata).
|
||||
- `SeqWorker`: Processes one sequence and returns zero or more output sequences (1→N transformation).
|
||||
- `SeqSliceWorker`: Processes a slice of sequences and returns another slice (bulk pipeline stage).
|
||||
|
||||
Key utilities include:
|
||||
|
||||
- **`NilSeqWorker`**: Identity worker—returns the input sequence unchanged.
|
||||
- **`AnnotatorToSeqWorker`**: Converts an in-place annotator into a `SeqWorker`, preserving compatibility with pipeline interfaces.
|
||||
- **`SeqToSliceWorker`**: Lifts a `SeqWorker` to operate on slices, with configurable error handling (`breakOnError`). Supports dynamic slice growth and logging via `obilog`.
|
||||
- **`SeqToSliceFilterOnWorker`**: Filters sequences in a slice using a `SequencePredicate`, preserving order and avoiding unnecessary allocations.
|
||||
- **`SeqToSliceConditionalWorker`**: Applies a `SeqWorker` only to sequences satisfying a predicate; others pass through unchanged.
|
||||
- **`.ChainWorkers()`**: Method on `SeqWorker` to compose two workers sequentially (pipeline chaining), enabling modular, reusable workflows.
|
||||
|
||||
All functions emphasize safety: errors are either propagated (`breakOnError = true`) or logged with warnings, ensuring robustness in large-scale sequence processing pipelines.
|
||||
Reference in New Issue
Block a user