autodoc/docmd/pkg_obiseq.md

Here's a **structured, semantic description** (≤200 lines) of the public API provided by the `obiseq` Go package, written in English and Markdown format:

```markdown
# BioSequence Attribute & Sequence Management (`obiseq`) — Public API Overview

The `obiseq` package provides a high-performance, thread-safe framework for representing and manipulating biological sequences (DNA/RNA/protein) in Go. It supports rich metadata, annotations, quality scores, taxonomic integration, and efficient batch processing—ideal for NGS pipelines like OBITools4.

## Core Sequence Representation

- `BioSequence`: Immutable-like container for sequence data (`[]byte`), ID, definition, qualities, features, and annotations.
- `NewBioSequence(...)`, `NewEmptyBioSequence(cap)`: Constructors supporting initialization with ID, sequence, definition, and optional qualities.
- `Id()`, `Definition()`: Accessors for core metadata fields (ID normalized to lowercase).
- `Sequence()` / `String()`: Returns the sequence as a copy or human-readable string.
- `Len()`, `HasSequence()` / `Composition()`: Length, presence check, and nucleotide composition (`a,c,g,t,o`).
- `MD5()`, `MemorySize()` / `Recycle()`: Integrity checksum, memory footprint estimation, and safe object pooling reset.

## Attribute & Annotation System

- `Annotations()`, `HasAnnotation(key)`: Read-only access to generic metadata map.
- Thread-safe via internal mutex (`AnnotationsLock()`).
- `GetAttribute(key)`, `SetAttribute(key, value)` / typed getters (`GetIntAttribute(...)`) with automatic type coercion.
- `Keys()` & `HasAttribute(key)`: Enumerate and check presence of attributes (including `"id"`, `"sequence"`).
- `AttributeKeys(skip_map, skip_definition)`: Aggregates all attribute keys across a collection.

## Quality & Feature Support

- `Qualities()` / `SetQualities(...)`: Per-base quality scores (Phred+40 default).
- `HasQualities()`, `Write(...)`, `Clear()` / quality ASCII conversion.
- `Features()`: Optional raw feature table (e.g., GenBank/EMBL).

## Pairing & Taxonomy

- `PairTo(p)`, `IsPaired()`, `UnPair()` / batch pairing for read-pairs.
- Taxonomic annotation:  
  - `Taxid()`, `SetTaxid(...)`, `Taxon(taxonomy)`  
  - Rank-specific: `SetSpecies()`, `SetGenus()` / generic via `SetTaxonAtRank(rank)`  
  - Full path & LCA: `Path()`, `SetTaxonomicDistribution(...)`

## Classification, Filtering & Transformation

- Classifiers:  
  - `AnnotationClassifier`, `DualAnnotationClassifier` / predicate-based (`PredicateClassifier`)  
  - Hashing, rotation & composite strategies (e.g., `CompositeClassifier`)
- Predicates:  
  - Length, abundance (`IsMoreAbundantOrEqualTo`) / regex matching on ID/sequence  
  - Expression-based (`ExpressionPredicat`), paired-end support
- Workers:  
  - `EditIdWorker`, `EditAttributeWorker` (via OBILang expressions)  
  - Taxonomic annotators (`MakeSetSpeciesWorker`, `LCA`) / reverse-complement & subsequence workers

## Collection Management & Efficiency

- `BioSequenceSlice`: Optimized batch container with:
  - Pool-aware allocation (`NewBioSequenceSlice`, `EnsureCapacity`)  
  - Efficient push/pop, sorting (on count/length), and merging
- `Merge(...)`: Sequence & slice-level consensus with stat propagation.
- Slice/annotation pooling:  
  - `GetSlice`, `RecycleSlice` / annotation recycling via pools
- Iterators:  
  - `Kmers(k)`: Lazy k-mer generator using Go’s new iterator protocol.

## Utility & Extension

- IUPAC support: `SameIUPACNuc(a, b)` for ambiguity-aware base comparison.
- Reverse complement: `ReverseComplement(inplace)`, mutation coordinate adjustment (`_revcmpMutation`).
- Subsequence extraction: `Subsequence(from, to, circular)` with quality & annotation preservation.
- Expression extensions (via OBILang):  
  - `gc`, `gcskew` / `elementof`, `sprintf`, `ifelse`

All methods ensure correctness via safe type conversions, locking semantics, and graceful fallbacks—enabling scalable bioinformatics workflows.
-											⬆️ version bump to v4.5
										
										
											2026-04-07 08:36:50 +02:00
+								Here's a **structured, semantic description** (≤200 lines) of the public API provided by the `obiseq` Go package, written in English and Markdown format:
 								```markdown
 								# BioSequence Attribute & Sequence Management (`obiseq`) — Public API Overview
 								The `obiseq` package provides a high-performance, thread-safe framework for representing and manipulating biological sequences (DNA/RNA/protein) in Go. It supports rich metadata, annotations, quality scores, taxonomic integration, and efficient batch processing—ideal for NGS pipelines like OBITools4.
 								## Core Sequence Representation
 								- `BioSequence`: Immutable-like container for sequence data (`[]byte`), ID, definition, qualities, features, and annotations.
 								- `NewBioSequence(...)`, `NewEmptyBioSequence(cap)`: Constructors supporting initialization with ID, sequence, definition, and optional qualities.
 								- `Id()`, `Definition()`: Accessors for core metadata fields (ID normalized to lowercase).
 								- `Sequence()` / `String()`: Returns the sequence as a copy or human-readable string.
 								- `Len()`, `HasSequence()` / `Composition()`: Length, presence check, and nucleotide composition (`a,c,g,t,o`).
 								- `MD5()`, `MemorySize()` / `Recycle()`: Integrity checksum, memory footprint estimation, and safe object pooling reset.
 								## Attribute & Annotation System
 								- `Annotations()`, `HasAnnotation(key)`: Read-only access to generic metadata map.
 								- Thread-safe via internal mutex (`AnnotationsLock()`).
 								- `GetAttribute(key)`, `SetAttribute(key, value)` / typed getters (`GetIntAttribute(...)`) with automatic type coercion.
 								- `Keys()` & `HasAttribute(key)`: Enumerate and check presence of attributes (including `"id"`, `"sequence"`).
 								- `AttributeKeys(skip_map, skip_definition)`: Aggregates all attribute keys across a collection.
 								## Quality & Feature Support
 								- `Qualities()` / `SetQualities(...)`: Per-base quality scores (Phred+40 default).
 								- `HasQualities()`, `Write(...)`, `Clear()` / quality ASCII conversion.
 								- `Features()`: Optional raw feature table (e.g., GenBank/EMBL).
 								## Pairing & Taxonomy
 								- `PairTo(p)`, `IsPaired()`, `UnPair()` / batch pairing for read-pairs.
 								- Taxonomic annotation:
 								  - `Taxid()`, `SetTaxid(...)`, `Taxon(taxonomy)`
 								  - Rank-specific: `SetSpecies()`, `SetGenus()` / generic via `SetTaxonAtRank(rank)`
 								  - Full path & LCA: `Path()`, `SetTaxonomicDistribution(...)`
 								## Classification, Filtering & Transformation
 								- Classifiers:
 								  - `AnnotationClassifier`, `DualAnnotationClassifier` / predicate-based (`PredicateClassifier`)
 								  - Hashing, rotation & composite strategies (e.g., `CompositeClassifier`)
 								- Predicates:
 								  - Length, abundance (`IsMoreAbundantOrEqualTo`) / regex matching on ID/sequence
 								  - Expression-based (`ExpressionPredicat`), paired-end support
 								- Workers:
 								  - `EditIdWorker`, `EditAttributeWorker` (via OBILang expressions)
 								  - Taxonomic annotators (`MakeSetSpeciesWorker`, `LCA`) / reverse-complement & subsequence workers
 								## Collection Management & Efficiency
 								- `BioSequenceSlice`: Optimized batch container with:
 								  - Pool-aware allocation (`NewBioSequenceSlice`, `EnsureCapacity`)
 								  - Efficient push/pop, sorting (on count/length), and merging
 								- `Merge(...)`: Sequence & slice-level consensus with stat propagation.
 								- Slice/annotation pooling:
 								  - `GetSlice`, `RecycleSlice` / annotation recycling via pools
 								- Iterators:
 								  - `Kmers(k)`: Lazy k-mer generator using Go’s new iterator protocol.
 								## Utility & Extension
 								- IUPAC support: `SameIUPACNuc(a, b)` for ambiguity-aware base comparison.
 								- Reverse complement: `ReverseComplement(inplace)`, mutation coordinate adjustment (`_revcmpMutation`).
 								- Subsequence extraction: `Subsequence(from, to, circular)` with quality & annotation preservation.
 								- Expression extensions (via OBILang):
 								  - `gc`, `gcskew` / `elementof`, `sprintf`, `ifelse`
 								All methods ensure correctness via safe type conversions, locking semantics, and graceful fallbacks—enabling scalable bioinformatics workflows.