Files
obitools4/autodoc/docmd/pkg/obiseq/biosequenceslice.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

2.1 KiB

obiseq Package: BioSequence Collection Management

The obiseq package provides a high-performance, memory-efficient implementation for managing collections of biological sequences (BioSequence) in Go. Its core type is BioSequenceSlice, a slice of pointers to BioSequence objects, optimized for batch processing in metagenomic pipelines.

Key Functionalities

  • Memory Pooling & Allocation Control:
    NewBioSequenceSlice and MakeBioSequenceSlice allow creating slices with optional capacity hints.
    EnsureCapacity(capacity) dynamically grows the underlying slice while logging warnings or panicking on persistent allocation failures.

  • Efficient Element Management:

    • Push(sequence): Appends a sequence to the end.
    • Pop(): Removes and returns the last element (nil-safe).
    • Pop0(): Efficiently removes and returns the first element.
  • Collection Metadata Queries:

    • Len(): Returns number of sequences in the slice.
    • Size(): Computes total sequence length (summing all .Len()).
    • NotEmpty(): Boolean check for non-empty collections.
  • Attribute Aggregation:
    AttributeKeys(skip_map, skip_definition) aggregates all attribute keys across sequences into a set—useful for schema inference or validation.

  • Sorting Capabilities:

    • SortOnCount(reverse): Sorts by read count (descending/ascending).
    • SortOnLength(reverse): Sorts by sequence length.
  • Taxonomy Integration:
    ExtractTaxonomy(taxonomy, seqAsTaxa) builds or extends a taxonomic tree from sequence paths.
    When seqAsTaxa=true, it injects pseudo-taxonomic labels for individual sequences (e.g., OTU:SEQ0000012345 [seqID]@sequence), enabling unified taxonomic/rarefaction workflows.

Design Highlights

  • Minimal allocations via manual slice management and slices.Grow.
  • Explicit niling of popped elements to aid garbage collection.
  • Integrated logging (via logrus) for allocation issues—critical in large-scale NGS data processing.
  • Designed to support BioSequenceBatch, a higher-level abstraction for streaming or parallelizable sequence batches.