Files
obitools4/autodoc/docmd/pkg/obiiter/numbering.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.7 KiB

NumberSequences Function — Semantic Description

The NumberSequences method assigns a unique sequential identifier (seq_number) to each biological sequence in an IBioSequence iterator, preserving consistency for paired-end reads.

Core Functionality

  • Sequential numbering: Assigns integers (starting from start, defaulting to 0 or user-defined) incrementally across sequences.
  • Thread-safe: Uses sync.Mutex and atomic.Int64 to safely manage the global counter during concurrent processing.
  • Paired-read support: When input is paired (IsPaired()), both reads in a pair receive the same seq_number, ensuring alignment between mates.

Parallelization Strategy

  • Default mode: Uses multiple workers (ParallelWorkers()) for performance; batches are processed concurrently.
  • Reordering mode: If forceReordering is true:
    • Input iterator is batch-sorted (SortBatches()).
    • Parallelism disabled (1 worker) to ensure deterministic numbering order.

Implementation Details

  • Each goroutine processes its own split of the input iterator.
  • A shared next_first counter tracks the next available sequence number globally.
  • Locking ensures atomic increment and assignment, preventing race conditions.

Output

Returns a new IBioSequence iterator:

  • Contains the same sequence batches (possibly reordered if sorted).
  • Each BioSequence object now carries a "seq_number" attribute.
  • Paired sequences are co-numbered and marked accordingly.

Use Cases

  • Preparing data for downstream tools requiring unique sequence IDs.
  • Maintaining cross-read identity in paired-end workflows (e.g., assembly, mapping).
  • Reproducible numbering across pipeline stages or restarts.