mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.4 KiB
1.4 KiB
Semantic Description of obiformats Package
The obiformats package provides utilities for parsing sequence headers in the OBItools4 framework, supporting two distinct formats:
- JSON-based format (e.g.,
{"id":"seq1", ...}): Detected by a leading{character. - Legacy OBI format (plain text, e.g.,
>seq1 description): Used when no JSON prefix is present.
Core Functions
-
ParseGuessedFastSeqHeader(sequence *obiseq.BioSequence)
Dynamically routes header parsing based on the first character of the sequence definition:- Calls
ParseFastSeqJsonHeaderif JSON-prefixed. - Otherwise invokes
ParseFastSeqOBIHeader.
- Calls
-
IParseFastSeqHeaderBatch(iterator, options...) obiiter.IBioSequence
Applies header parsing to a batch of sequences:- Takes an iterator over
BioSequences. - Uses optional configuration (e.g., parallelism, parsing behavior).
- Wraps the parser in a worker pipeline via
MakeIWorker, preserving sequence flow.
- Takes an iterator over
Design Principles
- Format agnosticism: Automatically detects header type.
- Iterator-based streaming: Enables memory-efficient batch processing of large datasets (e.g., FASTQ/FASTA).
- Extensibility: Options pattern (
WithOption) supports runtime customization.
This package serves as a header-decoding layer for downstream analysis in metagenomic or metabarcoding workflows.