mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
71 lines
3.9 KiB
Markdown
71 lines
3.9 KiB
Markdown
|
|
Here's a **structured, semantic description** (≤200 lines) of the public API provided by the `obiseq` Go package, written in English and Markdown format:
|
|||
|
|
|
|||
|
|
```markdown
|
|||
|
|
# BioSequence Attribute & Sequence Management (`obiseq`) — Public API Overview
|
|||
|
|
|
|||
|
|
The `obiseq` package provides a high-performance, thread-safe framework for representing and manipulating biological sequences (DNA/RNA/protein) in Go. It supports rich metadata, annotations, quality scores, taxonomic integration, and efficient batch processing—ideal for NGS pipelines like OBITools4.
|
|||
|
|
|
|||
|
|
## Core Sequence Representation
|
|||
|
|
|
|||
|
|
- `BioSequence`: Immutable-like container for sequence data (`[]byte`), ID, definition, qualities, features, and annotations.
|
|||
|
|
- `NewBioSequence(...)`, `NewEmptyBioSequence(cap)`: Constructors supporting initialization with ID, sequence, definition, and optional qualities.
|
|||
|
|
- `Id()`, `Definition()`: Accessors for core metadata fields (ID normalized to lowercase).
|
|||
|
|
- `Sequence()` / `String()`: Returns the sequence as a copy or human-readable string.
|
|||
|
|
- `Len()`, `HasSequence()` / `Composition()`: Length, presence check, and nucleotide composition (`a,c,g,t,o`).
|
|||
|
|
- `MD5()`, `MemorySize()` / `Recycle()`: Integrity checksum, memory footprint estimation, and safe object pooling reset.
|
|||
|
|
|
|||
|
|
## Attribute & Annotation System
|
|||
|
|
|
|||
|
|
- `Annotations()`, `HasAnnotation(key)`: Read-only access to generic metadata map.
|
|||
|
|
- Thread-safe via internal mutex (`AnnotationsLock()`).
|
|||
|
|
- `GetAttribute(key)`, `SetAttribute(key, value)` / typed getters (`GetIntAttribute(...)`) with automatic type coercion.
|
|||
|
|
- `Keys()` & `HasAttribute(key)`: Enumerate and check presence of attributes (including `"id"`, `"sequence"`).
|
|||
|
|
- `AttributeKeys(skip_map, skip_definition)`: Aggregates all attribute keys across a collection.
|
|||
|
|
|
|||
|
|
## Quality & Feature Support
|
|||
|
|
|
|||
|
|
- `Qualities()` / `SetQualities(...)`: Per-base quality scores (Phred+40 default).
|
|||
|
|
- `HasQualities()`, `Write(...)`, `Clear()` / quality ASCII conversion.
|
|||
|
|
- `Features()`: Optional raw feature table (e.g., GenBank/EMBL).
|
|||
|
|
|
|||
|
|
## Pairing & Taxonomy
|
|||
|
|
|
|||
|
|
- `PairTo(p)`, `IsPaired()`, `UnPair()` / batch pairing for read-pairs.
|
|||
|
|
- Taxonomic annotation:
|
|||
|
|
- `Taxid()`, `SetTaxid(...)`, `Taxon(taxonomy)`
|
|||
|
|
- Rank-specific: `SetSpecies()`, `SetGenus()` / generic via `SetTaxonAtRank(rank)`
|
|||
|
|
- Full path & LCA: `Path()`, `SetTaxonomicDistribution(...)`
|
|||
|
|
|
|||
|
|
## Classification, Filtering & Transformation
|
|||
|
|
|
|||
|
|
- Classifiers:
|
|||
|
|
- `AnnotationClassifier`, `DualAnnotationClassifier` / predicate-based (`PredicateClassifier`)
|
|||
|
|
- Hashing, rotation & composite strategies (e.g., `CompositeClassifier`)
|
|||
|
|
- Predicates:
|
|||
|
|
- Length, abundance (`IsMoreAbundantOrEqualTo`) / regex matching on ID/sequence
|
|||
|
|
- Expression-based (`ExpressionPredicat`), paired-end support
|
|||
|
|
- Workers:
|
|||
|
|
- `EditIdWorker`, `EditAttributeWorker` (via OBILang expressions)
|
|||
|
|
- Taxonomic annotators (`MakeSetSpeciesWorker`, `LCA`) / reverse-complement & subsequence workers
|
|||
|
|
|
|||
|
|
## Collection Management & Efficiency
|
|||
|
|
|
|||
|
|
- `BioSequenceSlice`: Optimized batch container with:
|
|||
|
|
- Pool-aware allocation (`NewBioSequenceSlice`, `EnsureCapacity`)
|
|||
|
|
- Efficient push/pop, sorting (on count/length), and merging
|
|||
|
|
- `Merge(...)`: Sequence & slice-level consensus with stat propagation.
|
|||
|
|
- Slice/annotation pooling:
|
|||
|
|
- `GetSlice`, `RecycleSlice` / annotation recycling via pools
|
|||
|
|
- Iterators:
|
|||
|
|
- `Kmers(k)`: Lazy k-mer generator using Go’s new iterator protocol.
|
|||
|
|
|
|||
|
|
## Utility & Extension
|
|||
|
|
|
|||
|
|
- IUPAC support: `SameIUPACNuc(a, b)` for ambiguity-aware base comparison.
|
|||
|
|
- Reverse complement: `ReverseComplement(inplace)`, mutation coordinate adjustment (`_revcmpMutation`).
|
|||
|
|
- Subsequence extraction: `Subsequence(from, to, circular)` with quality & annotation preservation.
|
|||
|
|
- Expression extensions (via OBILang):
|
|||
|
|
- `gc`, `gcskew` / `elementof`, `sprintf`, `ifelse`
|
|||
|
|
|
|||
|
|
All methods ensure correctness via safe type conversions, locking semantics, and graceful fallbacks—enabling scalable bioinformatics workflows.
|