mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.8 KiB
1.8 KiB
Semantic Description of ReadSequencesBatchFromFiles
This function implements concurrent, batched streaming of biological sequences from multiple input files.
Core Functionality
- Input: A slice of file paths (
[]string), an optional batch reader interface, and a concurrency level. - Default behavior: Uses
ReadSequencesFromFileif no custom reader is provided.
Concurrency Model
- Launches
concurrent_readersgoroutines to process files in parallel. - Files are distributed via a shared channel (
filenameChan) — ensuring fair load balancing.
Streaming Interface
- Returns an
obiiter.IBioSequence, a streaming iterator over batches of biological sequences. - Internally uses an atomic counter (
nextCounter) to assign unique, ordered IDs to sequence batches (viaReorder), preserving global order despite parallelism.
Error Handling & Logging
- Panics on file-open failure (via
log.Panicf). - Logs start/end of reading per file using structured logging (
log.Printf,log.Println).
Resource Management
- Uses a barrier pattern: each reader goroutine calls
batchiter.Done()upon completion. - A finalizer goroutine waits for all readers (
WaitAndClose) and logs termination.
Design Intent
- Enables scalable, memory-efficient ingestion of large NGS datasets.
- Decouples reading logic (via
IBatchReader) from orchestration — supporting pluggable formats. - Prioritizes throughput and deterministic ordering over strict FIFO per-file semantics.
Key Abstractions
| Type/Interface | Role |
|---|---|
IBatchReader |
Reader factory: (filename, options...) → SequenceIterator |
obiiter.IBioSequence |
Thread-safe batch iterator (push model) |
AtomicCounter |
Ensures globally unique, sequential batch IDs across goroutines |