mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,26 @@
|
||||
# FastSeq Reader Module — Semantic Description
|
||||
|
||||
This Go package (`obiformats`) provides high-performance parsing of FASTA/FASTQ files using a C-backed library (`fastseq_read.h`). It enables streaming, batched reading of biological sequences with optional quality scores.
|
||||
|
||||
## Core Features
|
||||
|
||||
- **C-based FASTX parsing**: Leverages `kseq.h` via Go's cgo for efficient, low-level file/stream parsing.
|
||||
- **Batched iteration**: Sequences are grouped into configurable batches (`batch_size`) for memory-efficient processing.
|
||||
- **Quality score handling**: Supports FASTQ; decodes Phred quality scores using a configurable shift offset (`obidefault.ReadQualitiesShift()`).
|
||||
- **Source tracking**: Each sequence carries its origin (filename or `"stdin"`), aiding provenance.
|
||||
- **Header parsing hook**: Optional custom header parser (`ParseFastSeqHeader`) allows metadata extraction or transformation.
|
||||
- **Full-file batching mode**: When enabled, yields a single batch containing the entire file (useful for small files or global operations).
|
||||
- **Stdin & File I/O**: Two entry points:
|
||||
- `ReadFastSeqFromFile(filename, ...)` for regular files.
|
||||
- `ReadFastSeqFromStdin(...)` to process piped input (e.g., from upstream tools).
|
||||
- **Error resilience**: Gracefully handles missing files, with logging (via `logrus`) for debugging.
|
||||
- **Async streaming**: Uses goroutines to decouple reading from consumption, enabling concurrent pipelines.
|
||||
|
||||
## Integration
|
||||
|
||||
Built on top of `obitools4`’s core abstractions:
|
||||
- `obiiter.IBioSequence`: Iterator interface for biological sequences.
|
||||
- `obiseq.BioSequence`: Data model holding name, sequence bytes, comment, and quality.
|
||||
- `obiutils`, `obidefault`: Utilities for path handling and defaults.
|
||||
|
||||
Designed for scalability in high-throughput metabarcoding pipelines.
|
||||
Reference in New Issue
Block a user