mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
27 lines
1.8 KiB
Markdown
27 lines
1.8 KiB
Markdown
# FastSeq Reader Module — Semantic Description
|
||
|
||
This Go package (`obiformats`) provides high-performance parsing of FASTA/FASTQ files using a C-backed library (`fastseq_read.h`). It enables streaming, batched reading of biological sequences with optional quality scores.
|
||
|
||
## Core Features
|
||
|
||
- **C-based FASTX parsing**: Leverages `kseq.h` via Go's cgo for efficient, low-level file/stream parsing.
|
||
- **Batched iteration**: Sequences are grouped into configurable batches (`batch_size`) for memory-efficient processing.
|
||
- **Quality score handling**: Supports FASTQ; decodes Phred quality scores using a configurable shift offset (`obidefault.ReadQualitiesShift()`).
|
||
- **Source tracking**: Each sequence carries its origin (filename or `"stdin"`), aiding provenance.
|
||
- **Header parsing hook**: Optional custom header parser (`ParseFastSeqHeader`) allows metadata extraction or transformation.
|
||
- **Full-file batching mode**: When enabled, yields a single batch containing the entire file (useful for small files or global operations).
|
||
- **Stdin & File I/O**: Two entry points:
|
||
- `ReadFastSeqFromFile(filename, ...)` for regular files.
|
||
- `ReadFastSeqFromStdin(...)` to process piped input (e.g., from upstream tools).
|
||
- **Error resilience**: Gracefully handles missing files, with logging (via `logrus`) for debugging.
|
||
- **Async streaming**: Uses goroutines to decouple reading from consumption, enabling concurrent pipelines.
|
||
|
||
## Integration
|
||
|
||
Built on top of `obitools4`’s core abstractions:
|
||
- `obiiter.IBioSequence`: Iterator interface for biological sequences.
|
||
- `obiseq.BioSequence`: Data model holding name, sequence bytes, comment, and quality.
|
||
- `obiutils`, `obidefault`: Utilities for path handling and defaults.
|
||
|
||
Designed for scalability in high-throughput metabarcoding pipelines.
|