mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
3.9 KiB
3.9 KiB
obingslibrary: High-Throughput Sequencing Demultiplexing Library
obingslibrary is a Go package for sample assignment in amplicon-based NGS workflows, using dual-indexed barcodes flanked by PCR primers. It enables robust, configurable demultiplexing of sequencing reads—even in the presence of errors or indels—by matching primer–tag patterns and assigning samples via tag lookup.
Core Functionalities
1. Primer & Tag Configuration
Marker: Defines a primer pair (forward/reverse), including:- Primer sequences (
Forward,Reverse) and reverse-complement variants. - Tag specifications: lengths, spacers (e.g.,
Nor fixed nucleotides), delimiters. - Mismatch/indel tolerance per direction (
SetAllowedMismatch,SetTagIndels).
- Primer sequences (
- Compilation:
Compile()/Compile2(): Builds internal pattern indexes (viaobiapat.ApatPattern) for fast, error-tolerant matching.- Supports
"strict","hamming"(substitutions only), or"indel"(Levenshtein) matching modes.
2. Sequence Matching & Demultiplexing
Match(sequence): Scans aBioSequencefor valid primer bindings:- Prioritizes forward-primer detection; falls back to reverse orientation.
- Returns
DemultiplexMatchwith:- Primer positions, mismatches, orientation (
IsDirect). - Barcode coordinates (
BarcodeStart,BarcodeEnd) and validity flag.
- Primer positions, mismatches, orientation (
- Primer dimer detection: If
BarcodeStart > BarcodeEnd, the read is flagged as invalid.
3. Tag Extraction & Annotation
ExtractBarcode(sequence, inplace):- Extracts the barcode region between forward/reverse primers.
- Reverse-complements if read is in reverse orientation (
IsDirect == false). - Annotates the sequence with:
- Primer names, positions, mismatches.
- Sample/experiment info (if tag assignment succeeds).
- Error messages (
Unassigned,NoMatch, etc.).
- Tag extraction strategies:
Fixed: Fixed-length tags.Delimited: Tags flanked by exact delimiters (e.g.,"NN").Rescue: Tolerates indels in delimiter or tag boundaries.
4. Sample Registration & Lookup
GetPCR(tagPair): Retrieves or registers a new PCR reaction indexed by tag pair (case-insensitive).NGSLibrary.Markers: Map of primer pairs →Markerobjects.- Lazy initialization via
GetMarker()for new primers.
- Lazy initialization via
5. Validation & Consistency Checks
CheckTagLength(): Ensures all registered tags have uniform length per direction.CheckPrimerUnicity(): Validates no primer is reused across markers; prevents self-complementary pairs.
6. Batch Processing & Parallelism
ExtractBarcodeSlice(sequences, options): Processes a slice of reads.- Configurable via
Options(fluent API):- Mismatch/indel budgets.
- Error handling (
discardErrors,OptionUnidentified). - Parallel workers, batch size.
- Configurable via
ExtractBarcodeSliceWorker(): Returns a reusable worker for concurrent pipelines.
7. Distance Metrics
Hamming(s1, s2): Counts mismatches between equal-length strings.Levenshtein(s1, s2): Computes edit distance (supports indels).
8. Sample Identification
TagExtractor: Extracts forward/reverse tags from primer-flanked regions.SampleIdentifier:- Matches extracted tags to known samples using configured strategy (
strict,hamming, orindel). - Returns best-matching sample, distance, and proposed tags.
- Matches extracted tags to known samples using configured strategy (
Design Highlights
- Memory-efficient: Uses reference-counted sequences (
Recycle()). - Error-aware: Rich error propagation (stored in
DemultiplexMatch.Erroror annotations). - Flexible tag design: Supports fixed, delimited (exact), and indel-resilient tags.
- Extensible via options: Functional setters for clean, testable configuration.
Use Case
Ideal for metabarcoding or targeted amplicon sequencing, where samples are multiplexed using unique dual barcodes. Ensures high specificity (unique tag pairs) and sensitivity (error-tolerant matching).