Files
obitools4/autodoc/docmd/pkg/obiiter/distribute.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.8 KiB

IDistribute: Semantic Description of Biosequence Distribution Functionality

The IDistribute type implements a thread-safe mechanism for distributing biosequences into classified, batched outputs.

  • Core Purpose: Enables concurrent processing of sequences by routing them to dedicated output channels based on classification keys.

  • Key Fields:

    • outputs: A map from integer class codes to output streams (IBioSequence).
    • news: An unbuffered channel emitting class codes when new output streams are created.
    • classifier: A pointer to a sequence classifier used to assign sequences to keys during distribution.
  • Thread Safety: All access to shared state (outputs, slices) is synchronized via a mutex.

  • Batching Strategy:

    • Sequences are accumulated per class key until either BatchSizeMax() sequences or BatchMem() bytes (per key) are reached.
    • Batches are flushed automatically and on finalization.
  • Asynchronous Processing:

    • The Distribute() method launches a goroutine that consumes the input iterator, classifies each sequence, and feeds batches to per-key outputs.
    • Outputs are closed only after all sequences have been processed.
  • Notifications:

    • The News() channel allows consumers to be notified of newly created output streams (i.e., when a new class key appears).
  • Error Handling:

    • Outputs(key) returns an error if the requested key has no associated output.
  • Integration:

    • Leverages obidefault.BatchSizeMax() and BatchMem() for configurable batch limits.
    • Uses SortBatches() on the input iterator to ensure ordered processing.

In summary, IDistribute provides a scalable, concurrent pipeline for classifying and batching biosequences based on user-defined classification logic.