Files
obitools4/autodoc/docmd/pkg_obiseq.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

3.9 KiB
Raw Blame History

Here's a structured, semantic description (≤200 lines) of the public API provided by the obiseq Go package, written in English and Markdown format:

# BioSequence Attribute & Sequence Management (`obiseq`) — Public API Overview

The `obiseq` package provides a high-performance, thread-safe framework for representing and manipulating biological sequences (DNA/RNA/protein) in Go. It supports rich metadata, annotations, quality scores, taxonomic integration, and efficient batch processing—ideal for NGS pipelines like OBITools4.

## Core Sequence Representation

- `BioSequence`: Immutable-like container for sequence data (`[]byte`), ID, definition, qualities, features, and annotations.
- `NewBioSequence(...)`, `NewEmptyBioSequence(cap)`: Constructors supporting initialization with ID, sequence, definition, and optional qualities.
- `Id()`, `Definition()`: Accessors for core metadata fields (ID normalized to lowercase).
- `Sequence()` / `String()`: Returns the sequence as a copy or human-readable string.
- `Len()`, `HasSequence()` / `Composition()`: Length, presence check, and nucleotide composition (`a,c,g,t,o`).
- `MD5()`, `MemorySize()` / `Recycle()`: Integrity checksum, memory footprint estimation, and safe object pooling reset.

## Attribute & Annotation System

- `Annotations()`, `HasAnnotation(key)`: Read-only access to generic metadata map.
- Thread-safe via internal mutex (`AnnotationsLock()`).
- `GetAttribute(key)`, `SetAttribute(key, value)` / typed getters (`GetIntAttribute(...)`) with automatic type coercion.
- `Keys()` & `HasAttribute(key)`: Enumerate and check presence of attributes (including `"id"`, `"sequence"`).
- `AttributeKeys(skip_map, skip_definition)`: Aggregates all attribute keys across a collection.

## Quality & Feature Support

- `Qualities()` / `SetQualities(...)`: Per-base quality scores (Phred+40 default).
- `HasQualities()`, `Write(...)`, `Clear()` / quality ASCII conversion.
- `Features()`: Optional raw feature table (e.g., GenBank/EMBL).

## Pairing & Taxonomy

- `PairTo(p)`, `IsPaired()`, `UnPair()` / batch pairing for read-pairs.
- Taxonomic annotation:  
  - `Taxid()`, `SetTaxid(...)`, `Taxon(taxonomy)`  
  - Rank-specific: `SetSpecies()`, `SetGenus()` / generic via `SetTaxonAtRank(rank)`  
  - Full path & LCA: `Path()`, `SetTaxonomicDistribution(...)`

## Classification, Filtering & Transformation

- Classifiers:  
  - `AnnotationClassifier`, `DualAnnotationClassifier` / predicate-based (`PredicateClassifier`)  
  - Hashing, rotation & composite strategies (e.g., `CompositeClassifier`)
- Predicates:  
  - Length, abundance (`IsMoreAbundantOrEqualTo`) / regex matching on ID/sequence  
  - Expression-based (`ExpressionPredicat`), paired-end support
- Workers:  
  - `EditIdWorker`, `EditAttributeWorker` (via OBILang expressions)  
  - Taxonomic annotators (`MakeSetSpeciesWorker`, `LCA`) / reverse-complement & subsequence workers

## Collection Management & Efficiency

- `BioSequenceSlice`: Optimized batch container with:
  - Pool-aware allocation (`NewBioSequenceSlice`, `EnsureCapacity`)  
  - Efficient push/pop, sorting (on count/length), and merging
- `Merge(...)`: Sequence & slice-level consensus with stat propagation.
- Slice/annotation pooling:  
  - `GetSlice`, `RecycleSlice` / annotation recycling via pools
- Iterators:  
  - `Kmers(k)`: Lazy k-mer generator using Gos new iterator protocol.

## Utility & Extension

- IUPAC support: `SameIUPACNuc(a, b)` for ambiguity-aware base comparison.
- Reverse complement: `ReverseComplement(inplace)`, mutation coordinate adjustment (`_revcmpMutation`).
- Subsequence extraction: `Subsequence(from, to, circular)` with quality & annotation preservation.
- Expression extensions (via OBILang):  
  - `gc`, `gcskew` / `elementof`, `sprintf`, `ifelse`

All methods ensure correctness via safe type conversions, locking semantics, and graceful fallbacks—enabling scalable bioinformatics workflows.