mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,37 @@
|
||||
# `obiseq` Package: BioSequence Collection Management
|
||||
|
||||
The `obiseq` package provides a high-performance, memory-efficient implementation for managing collections of biological sequences (`BioSequence`) in Go. Its core type is `BioSequenceSlice`, a slice of pointers to `BioSequence` objects, optimized for batch processing in metagenomic pipelines.
|
||||
|
||||
### Key Functionalities
|
||||
|
||||
- **Memory Pooling & Allocation Control**:
|
||||
`NewBioSequenceSlice` and `MakeBioSequenceSlice` allow creating slices with optional capacity hints.
|
||||
`EnsureCapacity(capacity)` dynamically grows the underlying slice while logging warnings or panicking on persistent allocation failures.
|
||||
|
||||
- **Efficient Element Management**:
|
||||
- `Push(sequence)`: Appends a sequence to the end.
|
||||
- `Pop()`: Removes and returns the last element (nil-safe).
|
||||
- `Pop0()`: Efficiently removes and returns the first element.
|
||||
|
||||
- **Collection Metadata Queries**:
|
||||
- `Len()`: Returns number of sequences in the slice.
|
||||
- `Size()`: Computes total sequence length (summing all `.Len()`).
|
||||
- `NotEmpty()`: Boolean check for non-empty collections.
|
||||
|
||||
- **Attribute Aggregation**:
|
||||
`AttributeKeys(skip_map, skip_definition)` aggregates all attribute keys across sequences into a set—useful for schema inference or validation.
|
||||
|
||||
- **Sorting Capabilities**:
|
||||
- `SortOnCount(reverse)`: Sorts by read count (descending/ascending).
|
||||
- `SortOnLength(reverse)`: Sorts by sequence length.
|
||||
|
||||
- **Taxonomy Integration**:
|
||||
`ExtractTaxonomy(taxonomy, seqAsTaxa)` builds or extends a taxonomic tree from sequence paths.
|
||||
When `seqAsTaxa=true`, it injects pseudo-taxonomic labels for individual sequences (e.g., `OTU:SEQ0000012345 [seqID]@sequence`), enabling unified taxonomic/rarefaction workflows.
|
||||
|
||||
### Design Highlights
|
||||
|
||||
- Minimal allocations via manual slice management and `slices.Grow`.
|
||||
- Explicit niling of popped elements to aid garbage collection.
|
||||
- Integrated logging (via `logrus`) for allocation issues—critical in large-scale NGS data processing.
|
||||
- Designed to support `BioSequenceBatch`, a higher-level abstraction for streaming or parallelizable sequence batches.
|
||||
Reference in New Issue
Block a user