# EcoPCR File Parser for Biological Sequences This Go package (`obiformats`) provides functionality to parse EcoPCR output files—tab-delimited CSV-like files containing amplified sequence data generated by the *EcoPCR* tool (used in metabarcoding pipelines). The parser supports two versions of the format (`v1` and `v2`) and extracts rich biological metadata alongside sequences. ## Key Features - **Version Detection**: Automatically detects EcoPCR file version via the `#@ecopcr-v2` header. - **Primer Extraction**: Reads forward and reverse primer sequences from comment lines in the file header. - **Mode Inference**: Identifies amplification mode (e.g., `direct`, `inverted`) from header metadata. - **Sequence Parsing**: Reads each record as a biological sequence (`obiseq.BioSequence`) with: - Name (with deduplication support) - Nucleotide/protein sequence - Comment field - **Structured Annotation**: Populates rich annotations including: - Taxonomic hierarchy (taxid, rank, species/genus/family names) - Primer matching info (`forward_match`, `reverse_mismatch`) - Melting temperatures (if present in v2) - Amplicon length and strand orientation - **Streaming & Batching**: Returns an iterator (`obiiter.IBioSequence`) for memory-efficient, batched processing of large files. - **File Handling**: Provides both `ReadEcoPCR` (from any `io.Reader`) and `ReadEcoPCRFromFile` convenience functions. ## Implementation Highlights - Custom line reader (`__readline__`) for robust header parsing. - CSV parser configured with `|` delimiter and comment support (`#`). - Deduplication of sequence names using a running count suffix. - Concurrent goroutine-based streaming to decouple I/O and processing. This module integrates with the broader *OBItools4* ecosystem for high-throughput sequence analysis in environmental DNA studies.