Files
obitools4/autodoc/docmd/pkg/obiiter/distribute.md
T

33 lines
1.8 KiB
Markdown
Raw Normal View History

2026-04-07 08:36:50 +02:00
# `IDistribute`: Semantic Description of Biosequence Distribution Functionality
The `IDistribute` type implements a thread-safe mechanism for distributing biosequences into classified, batched outputs.
- **Core Purpose**: Enables concurrent processing of sequences by routing them to dedicated output channels based on classification keys.
- **Key Fields**:
- `outputs`: A map from integer class codes to output streams (`IBioSequence`).
- `news`: An unbuffered channel emitting class codes when new output streams are created.
- `classifier`: A pointer to a sequence classifier used to assign sequences to keys during distribution.
- **Thread Safety**: All access to shared state (`outputs`, `slices`) is synchronized via a mutex.
- **Batching Strategy**:
- Sequences are accumulated per class key until either `BatchSizeMax()` sequences or `BatchMem()` bytes (per key) are reached.
- Batches are flushed automatically and on finalization.
- **Asynchronous Processing**:
- The `Distribute()` method launches a goroutine that consumes the input iterator, classifies each sequence, and feeds batches to per-key outputs.
- Outputs are closed only after all sequences have been processed.
- **Notifications**:
- The `News()` channel allows consumers to be notified of newly created output streams (i.e., when a new class key appears).
- **Error Handling**:
- `Outputs(key)` returns an error if the requested key has no associated output.
- **Integration**:
- Leverages `obidefault.BatchSizeMax()` and `BatchMem()` for configurable batch limits.
- Uses `SortBatches()` on the input iterator to ensure ordered processing.
In summary, `IDistribute` provides a scalable, concurrent pipeline for classifying and batching biosequences based on user-defined classification logic.