Files
obitools4/autodoc/docmd/pkg/obiformats/fastseq_read.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

27 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FastSeq Reader Module — Semantic Description
This Go package (`obiformats`) provides high-performance parsing of FASTA/FASTQ files using a C-backed library (`fastseq_read.h`). It enables streaming, batched reading of biological sequences with optional quality scores.
## Core Features
- **C-based FASTX parsing**: Leverages `kseq.h` via Go's cgo for efficient, low-level file/stream parsing.
- **Batched iteration**: Sequences are grouped into configurable batches (`batch_size`) for memory-efficient processing.
- **Quality score handling**: Supports FASTQ; decodes Phred quality scores using a configurable shift offset (`obidefault.ReadQualitiesShift()`).
- **Source tracking**: Each sequence carries its origin (filename or `"stdin"`), aiding provenance.
- **Header parsing hook**: Optional custom header parser (`ParseFastSeqHeader`) allows metadata extraction or transformation.
- **Full-file batching mode**: When enabled, yields a single batch containing the entire file (useful for small files or global operations).
- **Stdin & File I/O**: Two entry points:
- `ReadFastSeqFromFile(filename, ...)` for regular files.
- `ReadFastSeqFromStdin(...)` to process piped input (e.g., from upstream tools).
- **Error resilience**: Gracefully handles missing files, with logging (via `logrus`) for debugging.
- **Async streaming**: Uses goroutines to decouple reading from consumption, enabling concurrent pipelines.
## Integration
Built on top of `obitools4`s core abstractions:
- `obiiter.IBioSequence`: Iterator interface for biological sequences.
- `obiseq.BioSequence`: Data model holding name, sequence bytes, comment, and quality.
- `obiutils`, `obidefault`: Utilities for path handling and defaults.
Designed for scalability in high-throughput metabarcoding pipelines.