Files
obitools4/autodoc/docmd/pkg/obiformats/json_writer.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.7 KiB

JSON Output Module for Biological Sequences (obiformats)

This Go package provides utilities to serialize biological sequence data (from obiseq) into structured JSON format, supporting batch processing and parallel I/O.

  • JSONRecord(sequence): Converts a single BioSequence into an indented JSON object containing:

    • "id": Sequence identifier.
    • "sequence" (optional): Nucleotide/protein sequence string if present.
    • "qualities" (optional): Quality scores as a string if available.
    • "annotations" (optional): Metadata annotations map.
  • FormatJSONBatch(batch): Formats a batch of sequences as JSON array elements, returning a *bytes.Buffer. Handles comma separation and indentation.

  • WriteJSON(iterator, file): Writes a stream of sequences to an io.Writer, supporting:

    • Parallel workers (configurable via options).
    • Automatic compression (gzip/bgzip) if enabled.
    • Proper JSON array wrapping: [, chunked batches, and final ].
    • Atomic ordering to preserve sequence integrity across parallel writes.
  • WriteJSONToStdout() / WriteJSONToFile(): Convenience wrappers:

    • Outputs to stdout or a file (with append/truncate control).
    • Supports paired-end data: writes both forward and reverse reads to separate files when configured.
  • Internal helpers:

    • _UnescapeUnicodeCharactersInJSON(): Fixes double-escaped Unicode in JSON output (e.g., \\u00E9\u00E9).
    • Uses chunked concurrency with FileChunk, ordered by batch number to ensure valid JSON structure.

Designed for high-throughput NGS data pipelines, it ensures correctness and performance while integrating with obitools4's iterator-based processing model.