Files
obitools4/autodoc/docmd/pkg/obichunk/chunks_on_memory.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.6 KiB

ISequenceChunkOnMemory Function — Semantic Description

The function Isequencechunkonmemory, from the Go package obichunk, implements asynchronous in-memory chunking of biological sequence data.

It consumes an iterator over BioSequence objects and distributes them into heterogeneous batches using a provided classifier. The core purpose is to group sequences by classification (e.g., sample, taxon, or feature), store each group in memory as a slice (BioSequenceSlice), and emit them sequentially via an output iterator.

Key features:

  • Parallel processing: Each classification group (referred to as a flux) is processed in its own goroutine.
  • Thread-safe aggregation: A mutex ensures safe concurrent updates to shared chunks and sources maps.
  • Lazy emission: Batches are emitted only after all classification groups have been fully processed (jobDone.Wait()).
  • Ordered output: Batches are emitted in increasing order index (0, 1, …), preserving determinism despite parallel internal processing.
  • Error handling: Critical failures (e.g., channel retrieval errors) terminate the program with log.Fatalf.

Input:

  • An iterator (obiiter.IBioSequence) of raw sequences.
  • A *obiseq.BioSequenceClassifier, used to route each sequence into a classification bucket.

Output:

  • A new iterator yielding BioSequenceBatch objects, each containing all sequences belonging to one classification group and its source identifier.

Use case: Efficient parallel preprocessing of high-throughput sequencing data into sample- or taxon-specific batches for downstream analysis.