Files
obitools4/autodoc/docmd/pkg/obiiter/numbering.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

36 lines
1.7 KiB
Markdown

# `NumberSequences` Function — Semantic Description
The `NumberSequences` method assigns a unique sequential identifier (`seq_number`) to each biological sequence in an `IBioSequence` iterator, preserving consistency for paired-end reads.
## Core Functionality
- **Sequential numbering**: Assigns integers (starting from `start`, defaulting to 0 or user-defined) incrementally across sequences.
- **Thread-safe**: Uses `sync.Mutex` and `atomic.Int64` to safely manage the global counter during concurrent processing.
- **Paired-read support**: When input is paired (`IsPaired()`), both reads in a pair receive the *same* `seq_number`, ensuring alignment between mates.
## Parallelization Strategy
- **Default mode**: Uses multiple workers (`ParallelWorkers()`) for performance; batches are processed concurrently.
- **Reordering mode**: If `forceReordering` is true:
- Input iterator is batch-sorted (`SortBatches()`).
- Parallelism disabled (1 worker) to ensure deterministic numbering order.
## Implementation Details
- Each goroutine processes its own split of the input iterator.
- A shared `next_first` counter tracks the next available sequence number globally.
- Locking ensures atomic increment and assignment, preventing race conditions.
## Output
Returns a new `IBioSequence` iterator:
- Contains the same sequence batches (possibly reordered if sorted).
- Each `BioSequence` object now carries a `"seq_number"` attribute.
- Paired sequences are co-numbered and marked accordingly.
## Use Cases
- Preparing data for downstream tools requiring unique sequence IDs.
- Maintaining cross-read identity in paired-end workflows (e.g., assembly, mapping).
- Reproducible numbering across pipeline stages or restarts.