# EcoPCR File Parser for Biological Sequences

This Go package (`obiformats`) provides functionality to parse EcoPCR output files—tab-delimited CSV-like files containing amplified sequence data generated by the *EcoPCR* tool (used in metabarcoding pipelines). The parser supports two versions of the format (`v1` and `v2`) and extracts rich biological metadata alongside sequences.

## Key Features

- **Version Detection**: Automatically detects EcoPCR file version via the `#@ecopcr-v2` header.
- **Primer Extraction**: Reads forward and reverse primer sequences from comment lines in the file header.
- **Mode Inference**: Identifies amplification mode (e.g., `direct`, `inverted`) from header metadata.
- **Sequence Parsing**: Reads each record as a biological sequence (`obiseq.BioSequence`) with:
  - Name (with deduplication support)
  - Nucleotide/protein sequence
  - Comment field
- **Structured Annotation**: Populates rich annotations including:
  - Taxonomic hierarchy (taxid, rank, species/genus/family names)
  - Primer matching info (`forward_match`, `reverse_mismatch`)
  - Melting temperatures (if present in v2)
  - Amplicon length and strand orientation
- **Streaming & Batching**: Returns an iterator (`obiiter.IBioSequence`) for memory-efficient, batched processing of large files.
- **File Handling**: Provides both `ReadEcoPCR` (from any `io.Reader`) and `ReadEcoPCRFromFile` convenience functions.

## Implementation Highlights

- Custom line reader (`__readline__`) for robust header parsing.
- CSV parser configured with `|` delimiter and comment support (`#`).
- Deduplication of sequence names using a running count suffix.
- Concurrent goroutine-based streaming to decouple I/O and processing.

This module integrates with the broader *OBItools4* ecosystem for high-throughput sequence analysis in environmental DNA studies.