mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
NumberSequences Function — Semantic Description
The NumberSequences method assigns a unique sequential identifier (seq_number) to each biological sequence in an IBioSequence iterator, preserving consistency for paired-end reads.
Core Functionality
- Sequential numbering: Assigns integers (starting from
start, defaulting to 0 or user-defined) incrementally across sequences. - Thread-safe: Uses
sync.Mutexandatomic.Int64to safely manage the global counter during concurrent processing. - Paired-read support: When input is paired (
IsPaired()), both reads in a pair receive the sameseq_number, ensuring alignment between mates.
Parallelization Strategy
- Default mode: Uses multiple workers (
ParallelWorkers()) for performance; batches are processed concurrently. - Reordering mode: If
forceReorderingis true:- Input iterator is batch-sorted (
SortBatches()). - Parallelism disabled (1 worker) to ensure deterministic numbering order.
- Input iterator is batch-sorted (
Implementation Details
- Each goroutine processes its own split of the input iterator.
- A shared
next_firstcounter tracks the next available sequence number globally. - Locking ensures atomic increment and assignment, preventing race conditions.
Output
Returns a new IBioSequence iterator:
- Contains the same sequence batches (possibly reordered if sorted).
- Each
BioSequenceobject now carries a"seq_number"attribute. - Paired sequences are co-numbered and marked accordingly.
Use Cases
- Preparing data for downstream tools requiring unique sequence IDs.
- Maintaining cross-read identity in paired-end workflows (e.g., assembly, mapping).
- Reproducible numbering across pipeline stages or restarts.