mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
28 lines
1.4 KiB
Markdown
28 lines
1.4 KiB
Markdown
# Semantic Description of `obiformats` Package
|
|
|
|
The `obiformats` package provides utilities for parsing sequence headers in the OBItools4 framework, supporting two distinct formats:
|
|
|
|
- **JSON-based format** (e.g., `{"id":"seq1", ...}`): Detected by a leading `{` character.
|
|
- **Legacy OBI format** (plain text, e.g., `>seq1 description`): Used when no JSON prefix is present.
|
|
|
|
## Core Functions
|
|
|
|
- **`ParseGuessedFastSeqHeader(sequence *obiseq.BioSequence)`**
|
|
Dynamically routes header parsing based on the first character of the sequence definition:
|
|
- Calls `ParseFastSeqJsonHeader` if JSON-prefixed.
|
|
- Otherwise invokes `ParseFastSeqOBIHeader`.
|
|
|
|
- **`IParseFastSeqHeaderBatch(iterator, options...) obiiter.IBioSequence`**
|
|
Applies header parsing to a *batch* of sequences:
|
|
- Takes an iterator over `BioSequence`s.
|
|
- Uses optional configuration (e.g., parallelism, parsing behavior).
|
|
- Wraps the parser in a worker pipeline via `MakeIWorker`, preserving sequence flow.
|
|
|
|
## Design Principles
|
|
|
|
- **Format agnosticism**: Automatically detects header type.
|
|
- **Iterator-based streaming**: Enables memory-efficient batch processing of large datasets (e.g., FASTQ/FASTA).
|
|
- **Extensibility**: Options pattern (`WithOption`) supports runtime customization.
|
|
|
|
This package serves as a header-decoding layer for downstream analysis in metagenomic or metabarcoding workflows.
|