mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.1 KiB
2.1 KiB
FASTQ Parsing Module (obiformats)
This Go package provides robust, streaming-capable parsing of FASTQ files — a standard format for storing nucleotide sequences along with quality scores.
Core Functionalities
-
EndOfLastFastqEntry(buffer []byte) int
Locates the start position (@) of the last complete FASTQ entry in a byte buffer using state-machine scanning from end to beginning. Returns-1if no valid entry found. -
FastqChunkParser(...)
Returns a parser function for processing FASTQ data from anio.Reader. Handles:- Header parsing (
@id [definition]) - Sequence normalization (uppercase → lowercase,
U→Tconversion if enabled) - Quality score shifting (
quality_shift) - Strict validation (e.g.,
+line, matching sequence/length)
- Header parsing (
-
FastqChunkParserRope(...)
Optimized parser for rope-based input (PieceOfChunk), avoiding unnecessary memory copies. Uses direct line-by-line scanning. -
Batched File Parsing (
_ParseFastqFile,ReadFastq, etc.)
Enables concurrent, chunked parsing of large files:- Splits input into chunks using
ReadFileChunk - Uses configurable parallel workers (
nworker) - Pushes parsed batches to an iterator interface
- Splits input into chunks using
-
Convenience I/O Wrappers
ReadFastqFromFile(filename, ...): Parses a file by name.ReadFastqFromStdin(...): Reads FASTQ from standard input.
Key Options & Features
- Quality handling: Optional quality extraction (
with_quality), configurable offset (quality_shift) - Uracil-to-Thymine conversion:
UtoTflag for RNA→DNA normalization - Header annotation parsing: Optional post-parsing header interpretation via
ParseFastSeqHeader - Batch sorting & full-file mode: Supports both streaming and complete-file aggregation
Design Highlights
- Memory-efficient chunking with overlap-aware boundary detection (
EndOfLastFastqEntry) - Strict error reporting: Fails fast on malformed FASTQ (e.g., invalid chars, length mismatch)
- Integration with
obiseq,obiiter: Returns typed biological sequence slices and iterator streams compatible with the broader OBITools4 ecosystem.