mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
27 lines
1.8 KiB
Markdown
27 lines
1.8 KiB
Markdown
|
|
# FastSeq Reader Module — Semantic Description
|
|||
|
|
|
|||
|
|
This Go package (`obiformats`) provides high-performance parsing of FASTA/FASTQ files using a C-backed library (`fastseq_read.h`). It enables streaming, batched reading of biological sequences with optional quality scores.
|
|||
|
|
|
|||
|
|
## Core Features
|
|||
|
|
|
|||
|
|
- **C-based FASTX parsing**: Leverages `kseq.h` via Go's cgo for efficient, low-level file/stream parsing.
|
|||
|
|
- **Batched iteration**: Sequences are grouped into configurable batches (`batch_size`) for memory-efficient processing.
|
|||
|
|
- **Quality score handling**: Supports FASTQ; decodes Phred quality scores using a configurable shift offset (`obidefault.ReadQualitiesShift()`).
|
|||
|
|
- **Source tracking**: Each sequence carries its origin (filename or `"stdin"`), aiding provenance.
|
|||
|
|
- **Header parsing hook**: Optional custom header parser (`ParseFastSeqHeader`) allows metadata extraction or transformation.
|
|||
|
|
- **Full-file batching mode**: When enabled, yields a single batch containing the entire file (useful for small files or global operations).
|
|||
|
|
- **Stdin & File I/O**: Two entry points:
|
|||
|
|
- `ReadFastSeqFromFile(filename, ...)` for regular files.
|
|||
|
|
- `ReadFastSeqFromStdin(...)` to process piped input (e.g., from upstream tools).
|
|||
|
|
- **Error resilience**: Gracefully handles missing files, with logging (via `logrus`) for debugging.
|
|||
|
|
- **Async streaming**: Uses goroutines to decouple reading from consumption, enabling concurrent pipelines.
|
|||
|
|
|
|||
|
|
## Integration
|
|||
|
|
|
|||
|
|
Built on top of `obitools4`’s core abstractions:
|
|||
|
|
- `obiiter.IBioSequence`: Iterator interface for biological sequences.
|
|||
|
|
- `obiseq.BioSequence`: Data model holding name, sequence bytes, comment, and quality.
|
|||
|
|
- `obiutils`, `obidefault`: Utilities for path handling and defaults.
|
|||
|
|
|
|||
|
|
Designed for scalability in high-throughput metabarcoding pipelines.
|