- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.9 KiB
Functional Overview of the obicsv Package
The obicsv package enables efficient, configurable export of biological sequence data (e.g., FASTA/FASTQ) to CSV format. It supports selective column inclusion, parallel batch processing, compression, and seamless CLI integration—ideal for high-throughput NGS pipelines.
Core Capabilities
| Domain | Features |
|---|---|
| Column Selection & Formatting | Toggle output fields (CSVId, CSVSequence, CSVTaxon, etc.); define custom attributes via CSVKey/CSVKeys; set separator (CSVSeparator) and NA placeholder (CSVNAValue). |
| I/O & File Handling | Write to stdout or file (append/truncate); support gzip compression (OptionsCompressed); configure batch size and full-file batching. |
| Processing Strategy | Parallel workers (default: obidefault.ParallelWorkers()); unordered iteration (NoOrder); progress tracking; skip empty sequences. |
| Metadata Enrichment | Auto-detect columns (CSVAutoColumn); integrate obipairing, taxonomic data, and abundance counts; support Phred+shifted quality scores. |
| CLI Integration | Command-line flags (--ids, --sequence, --taxon, etc.); extendable via helper functions (CLIPrintId(), CLIHasToBeKeptAttributes()). |
Public API Summary
-
MakeOptions([]WithOption)
Builder-style configuration of export behavior. Supported options:CSVId,CSVTaxon,OptionsFileName,OptionAppendFile, etc. -
NewCSVSequenceIterator(IBioSequence, ...WithOption)
Wraps a sequence iterator into an async CSV record stream. Launches parallel workers, handles batching, and auto-detects attributes when enabled. -
CSVSequenceHeader(Options)
Generates a CSV header row based on enabled columns and custom keys. -
CSVBatchFromSequences(BioSequenceBatch, Options)
Converts a batch of sequences intoCSVRecordentries per configured options. -
WriteCSV(ICSVRecord, io.WriteCloser)
Writes CSV data to any writer with compression and parallelization support. -
WriteCSVToStdout(),WriteCSVToFile()
Convenience wrappers for common I/O targets. -
FormatCVSBatch(CSVRecordBatch, string)
Renders a batch of records as an in-memory CSV buffer (header prepended only for first chunk).
Design Principles
- Streaming & Laziness: Uses iterator patterns to avoid full data loading.
- Parallelism: Producer-consumer model with configurable concurrency (min 2 workers).
- Resilience: Graceful handling of missing fields via configurable NA values.
- Extensibility: Supports dynamic attributes (e.g.,
obipairingexpands to 8 fields).
Usage Example
opt := MakeOptions([]WithOption{
OptionFileName("results.csv"),
CSVId(true),
CSVTaxon(false),
OptionsAppendFile(true),
})
iter := NewCSVSequenceIterator(sourceIter, opt)
WriteCSV(iter, os.Stdout) // or file