mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
33 lines
1.8 KiB
Markdown
33 lines
1.8 KiB
Markdown
|
|
# `IDistribute`: Semantic Description of Biosequence Distribution Functionality
|
||
|
|
|
||
|
|
The `IDistribute` type implements a thread-safe mechanism for distributing biosequences into classified, batched outputs.
|
||
|
|
|
||
|
|
- **Core Purpose**: Enables concurrent processing of sequences by routing them to dedicated output channels based on classification keys.
|
||
|
|
|
||
|
|
- **Key Fields**:
|
||
|
|
- `outputs`: A map from integer class codes to output streams (`IBioSequence`).
|
||
|
|
- `news`: An unbuffered channel emitting class codes when new output streams are created.
|
||
|
|
- `classifier`: A pointer to a sequence classifier used to assign sequences to keys during distribution.
|
||
|
|
|
||
|
|
- **Thread Safety**: All access to shared state (`outputs`, `slices`) is synchronized via a mutex.
|
||
|
|
|
||
|
|
- **Batching Strategy**:
|
||
|
|
- Sequences are accumulated per class key until either `BatchSizeMax()` sequences or `BatchMem()` bytes (per key) are reached.
|
||
|
|
- Batches are flushed automatically and on finalization.
|
||
|
|
|
||
|
|
- **Asynchronous Processing**:
|
||
|
|
- The `Distribute()` method launches a goroutine that consumes the input iterator, classifies each sequence, and feeds batches to per-key outputs.
|
||
|
|
- Outputs are closed only after all sequences have been processed.
|
||
|
|
|
||
|
|
- **Notifications**:
|
||
|
|
- The `News()` channel allows consumers to be notified of newly created output streams (i.e., when a new class key appears).
|
||
|
|
|
||
|
|
- **Error Handling**:
|
||
|
|
- `Outputs(key)` returns an error if the requested key has no associated output.
|
||
|
|
|
||
|
|
- **Integration**:
|
||
|
|
- Leverages `obidefault.BatchSizeMax()` and `BatchMem()` for configurable batch limits.
|
||
|
|
- Uses `SortBatches()` on the input iterator to ensure ordered processing.
|
||
|
|
|
||
|
|
In summary, `IDistribute` provides a scalable, concurrent pipeline for classifying and batching biosequences based on user-defined classification logic.
|