mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.8 KiB
2.8 KiB
Semantic Description of obingslibrary Package
The obingslibrary package provides core functionality for multiplexed high-throughput sequencing (HTS) data processing, specifically designed to extract, validate, and assign biological samples from NGS reads using dual-indexed barcodes flanked by primers.
Key Functionalities
-
Primer & Tag Matching Structures
PrimerMatch: Encodes location, orientation (Forward), mismatch count, and marker identity of primer hits.TagMatcher: Functional interface for extracting sample-specific tags from sequence regions.
-
Distance Metrics
Hamming: Counts character mismatches between equal-length strings (for strict mismatch tolerance).Levenshtein: Computes edit distance allowing insertions/deletions (for indel-tolerant matching).
-
Tag Extraction Strategies
lookForTag: Extracts delimited tags (e.g., between two identical delimiters).lookForRescueTag: Robustly extracts tags despite indels or variable delimiter lengths.*Fixed/Delimited/RescueTagExtractormethods: Support three tag formats per primer direction (fixed-length, delimited with exact delimiters, or rescue-tolerant).
-
Marker & Library Abstraction
NGSLibrary: Holds a map of primer pairs (PrimerPair) toMarkerobjects.- Each
Marker: Defines forward/reverse primer sequences, tag specifications (length/spacer/delimiter/indels), and sample-to-tag mappings.
-
Tag Assignment & Sample Identification
TagExtractor: Extracts forward/reverse tags from primer-flanked regions and annotates them.SampleIdentifier: Matches extracted tags to known samples using configurable matching modes:"strict": Exact match only."hamming": Closest tag by Hamming distance (substitutions)."indel": Closest tag by Levenshtein distance.
- Annotates results with matching mode, distances, and proposed tags.
-
Multi-Barcode Extraction
ExtractMultiBarcode: Scans a full sequence for primer pairs (forward/reverse + their complements), detects valid amplicon intervals, and:- Extracts the internal barcode region.
- Assigns tags → sample via
SampleIdentifier. - Annotates each barcode with primer matches, errors, directionality.
- Handles both orientations (
forwardandreverse) of the amplicon.
-
Parallel Processing Integration
ExtractMultiBarcodeSliceWorker: Returns a reusable worker function for batch processing sequences, supporting options like indel tolerance and mismatch limits.
Use Case
This package enables demultiplexing of NGS reads in amplicon-based workflows (e.g., metabarcoding), where samples are labeled with unique dual barcodes. It ensures robustness against sequencing errors and supports flexible tag design.