Files
obitools4/autodoc/docmd/pkg/obichunk/subchunks.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

30 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Semantic Description of `obichunk.ISequenceSubChunk`
The function `ISequenceSubChunk` in the `obichunk` package implements **parallel, class-based sorting and batching of biological sequences**, preserving input order within each batch while reordering across batches by classification code.
## Core Functionality
- **Input**:
- An iterator over `BioSequence` batches (`obiiter.IBioSequence`)
- A sequence classifier (`obiseq.BioSequenceClassifier`) assigning each sequence a numeric class code
- A number of worker goroutines (`nworkers`), defaulting to system-configured parallelism
- **Processing**:
- Each worker consumes its own iterator split and classifier clone, enabling concurrent batch processing.
- For each incoming `BioSequenceBatch`:
- If the batch has >1 sequence: sequences are extracted, classified into `code`, and sorted *in-place* by class code.
- Consecutive sequences with the same `code` are grouped into new batches; a new batch is emitted upon code change.
- If the batch has ≤1 sequence, its passed through unchanged (but reordered with a new order ID).
- **Ordering Mechanism**:
- Uses `atomic.AddInt32` to assign strictly increasing order IDs (`nextOrder`) across workers, preserving deterministic inter-batch ordering.
- Sorting within batches is performed via a custom `sort.Interface` implementation using closures for flexible comparison logic (here, by ascending class code).
- **Output**:
- Returns a new iterator (`obiiter.IBioSequence`) emitting batches grouped by classification code, with globally ordered batch IDs.
- Workers are coordinated via `newIter.Done()`/`Wait()/Close()`, ensuring clean termination.
## Semantic Purpose
Enables efficient, parallel **grouping of sequences by taxonomic or functional class** (e.g., OTU assignment), optimizing downstream processing that requires sorted/class-ordered input — e.g., consensus building, alignment, or read merging per group.