mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.8 KiB
1.8 KiB
FastSeq Reader Module — Semantic Description
This Go package (obiformats) provides high-performance parsing of FASTA/FASTQ files using a C-backed library (fastseq_read.h). It enables streaming, batched reading of biological sequences with optional quality scores.
Core Features
- C-based FASTX parsing: Leverages
kseq.hvia Go's cgo for efficient, low-level file/stream parsing. - Batched iteration: Sequences are grouped into configurable batches (
batch_size) for memory-efficient processing. - Quality score handling: Supports FASTQ; decodes Phred quality scores using a configurable shift offset (
obidefault.ReadQualitiesShift()). - Source tracking: Each sequence carries its origin (filename or
"stdin"), aiding provenance. - Header parsing hook: Optional custom header parser (
ParseFastSeqHeader) allows metadata extraction or transformation. - Full-file batching mode: When enabled, yields a single batch containing the entire file (useful for small files or global operations).
- Stdin & File I/O: Two entry points:
ReadFastSeqFromFile(filename, ...)for regular files.ReadFastSeqFromStdin(...)to process piped input (e.g., from upstream tools).
- Error resilience: Gracefully handles missing files, with logging (via
logrus) for debugging. - Async streaming: Uses goroutines to decouple reading from consumption, enabling concurrent pipelines.
Integration
Built on top of obitools4’s core abstractions:
obiiter.IBioSequence: Iterator interface for biological sequences.obiseq.BioSequence: Data model holding name, sequence bytes, comment, and quality.obiutils,obidefault: Utilities for path handling and defaults.
Designed for scalability in high-throughput metabarcoding pipelines.