mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 03:50:39 +00:00
⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
This commit is contained in:
@@ -0,0 +1,29 @@
|
||||
# EcoPCR File Parser for Biological Sequences
|
||||
|
||||
This Go package (`obiformats`) provides functionality to parse EcoPCR output files—tab-delimited CSV-like files containing amplified sequence data generated by the *EcoPCR* tool (used in metabarcoding pipelines). The parser supports two versions of the format (`v1` and `v2`) and extracts rich biological metadata alongside sequences.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Version Detection**: Automatically detects EcoPCR file version via the `#@ecopcr-v2` header.
|
||||
- **Primer Extraction**: Reads forward and reverse primer sequences from comment lines in the file header.
|
||||
- **Mode Inference**: Identifies amplification mode (e.g., `direct`, `inverted`) from header metadata.
|
||||
- **Sequence Parsing**: Reads each record as a biological sequence (`obiseq.BioSequence`) with:
|
||||
- Name (with deduplication support)
|
||||
- Nucleotide/protein sequence
|
||||
- Comment field
|
||||
- **Structured Annotation**: Populates rich annotations including:
|
||||
- Taxonomic hierarchy (taxid, rank, species/genus/family names)
|
||||
- Primer matching info (`forward_match`, `reverse_mismatch`)
|
||||
- Melting temperatures (if present in v2)
|
||||
- Amplicon length and strand orientation
|
||||
- **Streaming & Batching**: Returns an iterator (`obiiter.IBioSequence`) for memory-efficient, batched processing of large files.
|
||||
- **File Handling**: Provides both `ReadEcoPCR` (from any `io.Reader`) and `ReadEcoPCRFromFile` convenience functions.
|
||||
|
||||
## Implementation Highlights
|
||||
|
||||
- Custom line reader (`__readline__`) for robust header parsing.
|
||||
- CSV parser configured with `|` delimiter and comment support (`#`).
|
||||
- Deduplication of sequence names using a running count suffix.
|
||||
- Concurrent goroutine-based streaming to decouple I/O and processing.
|
||||
|
||||
This module integrates with the broader *OBItools4* ecosystem for high-throughput sequence analysis in environmental DNA studies.
|
||||
Reference in New Issue
Block a user