Files
obitools4/autodoc/docmd/pkg_obitools_obicsv.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

2.9 KiB

Functional Overview of the obicsv Package

The obicsv package enables efficient, configurable export of biological sequence data (e.g., FASTA/FASTQ) to CSV format. It supports selective column inclusion, parallel batch processing, compression, and seamless CLI integration—ideal for high-throughput NGS pipelines.

Core Capabilities

Domain Features
Column Selection & Formatting Toggle output fields (CSVId, CSVSequence, CSVTaxon, etc.); define custom attributes via CSVKey/CSVKeys; set separator (CSVSeparator) and NA placeholder (CSVNAValue).
I/O & File Handling Write to stdout or file (append/truncate); support gzip compression (OptionsCompressed); configure batch size and full-file batching.
Processing Strategy Parallel workers (default: obidefault.ParallelWorkers()); unordered iteration (NoOrder); progress tracking; skip empty sequences.
Metadata Enrichment Auto-detect columns (CSVAutoColumn); integrate obipairing, taxonomic data, and abundance counts; support Phred+shifted quality scores.
CLI Integration Command-line flags (--ids, --sequence, --taxon, etc.); extendable via helper functions (CLIPrintId(), CLIHasToBeKeptAttributes()).

Public API Summary

  • MakeOptions([]WithOption)
    Builder-style configuration of export behavior. Supported options: CSVId, CSVTaxon, OptionsFileName, OptionAppendFile, etc.

  • NewCSVSequenceIterator(IBioSequence, ...WithOption)
    Wraps a sequence iterator into an async CSV record stream. Launches parallel workers, handles batching, and auto-detects attributes when enabled.

  • CSVSequenceHeader(Options)
    Generates a CSV header row based on enabled columns and custom keys.

  • CSVBatchFromSequences(BioSequenceBatch, Options)
    Converts a batch of sequences into CSVRecord entries per configured options.

  • WriteCSV(ICSVRecord, io.WriteCloser)
    Writes CSV data to any writer with compression and parallelization support.

  • WriteCSVToStdout(), WriteCSVToFile()
    Convenience wrappers for common I/O targets.

  • FormatCVSBatch(CSVRecordBatch, string)
    Renders a batch of records as an in-memory CSV buffer (header prepended only for first chunk).

Design Principles

  • Streaming & Laziness: Uses iterator patterns to avoid full data loading.
  • Parallelism: Producer-consumer model with configurable concurrency (min 2 workers).
  • Resilience: Graceful handling of missing fields via configurable NA values.
  • Extensibility: Supports dynamic attributes (e.g., obipairing expands to 8 fields).

Usage Example

opt := MakeOptions([]WithOption{
    OptionFileName("results.csv"),
    CSVId(true), 
    CSVTaxon(false),
    OptionsAppendFile(true),
})
iter := NewCSVSequenceIterator(sourceIter, opt)
WriteCSV(iter, os.Stdout) // or file