mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
JSON Output Module for Biological Sequences (obiformats)
This Go package provides utilities to serialize biological sequence data (from obiseq) into structured JSON format, supporting batch processing and parallel I/O.
-
JSONRecord(sequence): Converts a singleBioSequenceinto an indented JSON object containing:"id": Sequence identifier."sequence"(optional): Nucleotide/protein sequence string if present."qualities"(optional): Quality scores as a string if available."annotations"(optional): Metadata annotations map.
-
FormatJSONBatch(batch): Formats a batch of sequences as JSON array elements, returning a*bytes.Buffer. Handles comma separation and indentation. -
WriteJSON(iterator, file): Writes a stream of sequences to anio.Writer, supporting:- Parallel workers (configurable via options).
- Automatic compression (
gzip/bgzip) if enabled. - Proper JSON array wrapping:
[, chunked batches, and final]. - Atomic ordering to preserve sequence integrity across parallel writes.
-
WriteJSONToStdout()/WriteJSONToFile(): Convenience wrappers:- Outputs to stdout or a file (with append/truncate control).
- Supports paired-end data: writes both forward and reverse reads to separate files when configured.
-
Internal helpers:
_UnescapeUnicodeCharactersInJSON(): Fixes double-escaped Unicode in JSON output (e.g.,\\u00E9→\u00E9).- Uses chunked concurrency with
FileChunk, ordered by batch number to ensure valid JSON structure.
Designed for high-throughput NGS data pipelines, it ensures correctness and performance while integrating with obitools4's iterator-based processing model.