Files
obitools4/autodoc/docmd/pkg/obiformats/fastseq_read.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.8 KiB
Raw Blame History

FastSeq Reader Module — Semantic Description

This Go package (obiformats) provides high-performance parsing of FASTA/FASTQ files using a C-backed library (fastseq_read.h). It enables streaming, batched reading of biological sequences with optional quality scores.

Core Features

  • C-based FASTX parsing: Leverages kseq.h via Go's cgo for efficient, low-level file/stream parsing.
  • Batched iteration: Sequences are grouped into configurable batches (batch_size) for memory-efficient processing.
  • Quality score handling: Supports FASTQ; decodes Phred quality scores using a configurable shift offset (obidefault.ReadQualitiesShift()).
  • Source tracking: Each sequence carries its origin (filename or "stdin"), aiding provenance.
  • Header parsing hook: Optional custom header parser (ParseFastSeqHeader) allows metadata extraction or transformation.
  • Full-file batching mode: When enabled, yields a single batch containing the entire file (useful for small files or global operations).
  • Stdin & File I/O: Two entry points:
    • ReadFastSeqFromFile(filename, ...) for regular files.
    • ReadFastSeqFromStdin(...) to process piped input (e.g., from upstream tools).
  • Error resilience: Gracefully handles missing files, with logging (via logrus) for debugging.
  • Async streaming: Uses goroutines to decouple reading from consumption, enabling concurrent pipelines.

Integration

Built on top of obitools4s core abstractions:

  • obiiter.IBioSequence: Iterator interface for biological sequences.
  • obiseq.BioSequence: Data model holding name, sequence bytes, comment, and quality.
  • obiutils, obidefault: Utilities for path handling and defaults.

Designed for scalability in high-throughput metabarcoding pipelines.