mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
CSV Import Module for Biological Sequences (obiformats)
This Go package provides functionality to parse biological sequence data from CSV files into structured objects compatible with the OBItools4 framework.
Core Features
- CSV Parsing: Reads CSV data via
io.Reader, supporting comments (#), flexible field counts, and leading-space trimming. - Sequence Extraction: Identifies columns named
sequence,id, orqualitiesby header and maps them to corresponding biological sequence fields. - Quality Score Adjustment: Applies a configurable Phred score shift (default:
33) to quality strings. - Metadata Handling:
- Special handling for taxonomic IDs (
taxid,*_taxid). - Generic attributes parsed as JSON when possible; fallback to raw string otherwise.
- Special handling for taxonomic IDs (
- Batched Output: Streams sequences in configurable batches (
batchSize) via an iterator interface (obiiter.IBioSequence). - Multiple Entry Points:
ReadCSV: From anyio.Reader.ReadCSVFromFile: Loads from a file (with source naming derived from filename).ReadCSVFromStdin: Reads from standard input.
- Error & Edge Handling:
- Gracefully handles empty files/streams via
ReadEmptyFile. - Uses structured logging (Logrus) for fatal and informational messages.
- Gracefully handles empty files/streams via
Integration
Designed to integrate with OBItools4’s core types:
obiseq.BioSequence: Holds sequence, ID, qualities, taxid, and arbitrary attributes.obiiter.IBioSequence: Streaming interface for batched sequence iteration.
Use Case
Efficient, flexible ingestion of tabular biological data (e.g., from alignment outputs or preprocessed FASTQ/FASTA conversions) into downstream analysis pipelines.