⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
This commit is contained in:
Eric Coissac
2026-04-07 08:36:50 +02:00
parent 670edc1958
commit 8c7017a99d
392 changed files with 18875 additions and 141 deletions
+40
View File
@@ -0,0 +1,40 @@
# Semantic Description of `obingslibrary` Marker Module
The `Marker` struct defines a molecular biology primer pair (forward/reverse) for PCR-based sample demultiplexing in high-throughput sequencing workflows. It supports flexible configuration of primer binding, tag (barcode) extraction, mismatch tolerance, and indel handling.
## Core Functionalities
- **Primer Pattern Compilation**:
`Compile()` and `Compile2()` initialize forward/reverse primer patterns using the underlying `obiapat.ApatPattern`, including reverse-complement variants (`cforward`, `creverse`). They accept parameters for maximum error tolerance and indel allowance.
- **Sequence Matching & Demultiplexing**:
`Match()` scans a given sequence (`BioSequence`) for primer binding sites. It prioritizes forward-primer detection, then falls back to reverse if needed. For each match:
- Extracts primer region and adjacent tag (barcode).
- Computes mismatches.
- Links to a pre-registered `PCR` object via the tag pair (`TagPair`) key in internal map.
- **Sample Registration & Lookup**:
`GetPCR()` retrieves or registers a new PCR reaction entry indexed by forward/reverse tag pair (case-insensitive). Enables tracking of sample-specific amplification data.
- **Tag Length Validation**:
`CheckTagLength()` ensures all registered tags have uniform length for both directions; otherwise, returns an error.
- **Configurable Parameters**:
Supports tuning of:
- Tag lengths (`Forward_tag_length`, `Reverse_tag_length`)
- Spacer between tag and primer (`SetTagSpacer()`)
- Delimiter for tag-primer boundary (e.g., `a`, `c`, `g`, `t` or none via `'0'`)
- Allowed mismatches and indels per primer (`SetAllowedMismatch()`, `SetTagIndels()`)
- Matching strategy: `"strict"` (exact), `"hamming"`, or `"indel"`
- **Matching Strategy Enforcement**:
`SetForward/ReverseMatching()` validates and sets matching modes; invalid values raise errors.
## Design Highlights
- Uses `log.Fatalf` for critical configuration failures (e.g., invalid delimiter).
- Leverages reference-counted sequences (`Recycle()`) for memory efficiency.
- Prioritizes forward primer match but gracefully handles reverse orientation.
- Fully supports case-insensitive tag comparison and normalization.
This module serves as the core engine for sample assignment in amplicon-based NGS pipelines, balancing sensitivity (via error/indel tolerance) and specificity (through tag uniqueness).
+32
View File
@@ -0,0 +1,32 @@
# Demultiplexing Functionality in `obingslibrary`
This package provides tools for **demultiplexing NGS reads** by matching them against known primer pairs and extracting associated barcodes.
## Core Types
- `DemultiplexMatch`: Struct holding alignment results for forward/reverse primers, mismatches, barcode coordinates (`BarcodeStart`, `BarcodeEnd`), and metadata (e.g., sample/experiment info via `PCR`). Includes error handling.
## Key Methods
- **`Match(sequence)`**:
Scans the input `BioSequence` against all primer pairs in `NGSLibrary.Markers`. Returns a populated `DemultiplexMatch` if any primer pair matches.
- **`ExtractBarcode(sequence, inplace)`**:
Uses the result of `Match()` to:
- Extract the barcode region (if valid: non-dimer).
- Reverse-complement if read is in reverse orientation (`IsDirect == false`).
- Annotate the sequence with:
- Primer names and match details (positions, mismatches).
- Direction (`direct`/`reverse`).
- Sample/experiment info (if assignment succeeds), or error message.
## Behavior Notes
- **Primer dimer detection**: If `BarcodeStart > BarcodeEnd`, the read is flagged as a primer dimer and not extracted.
- **Error handling**: Errors (e.g., no match, sample unassignment) are stored in `match.Error` and propagated as annotations.
- **Annotation richness**: Output sequences carry rich metadata (sample, experiment, primers, errors), supporting downstream filtering/analysis.
## Dependencies
- Uses `logrus` for fatal logging (e.g., subsequence extraction failure).
- Integrates with `obiseq.BioSequence` for sequence representation and manipulation.
@@ -0,0 +1,43 @@
# Semantic Description of `obingslibrary` Package
The `obingslibrary` package provides core functionality for **multiplexed high-throughput sequencing (HTS) data processing**, specifically designed to extract, validate, and assign biological samples from NGS reads using **dual-indexed barcodes** flanked by primers.
## Key Functionalities
1. **Primer & Tag Matching Structures**
- `PrimerMatch`: Encodes location, orientation (`Forward`), mismatch count, and marker identity of primer hits.
- `TagMatcher`: Functional interface for extracting sample-specific tags from sequence regions.
2. **Distance Metrics**
- `Hamming`: Counts character mismatches between equal-length strings (for strict mismatch tolerance).
- `Levenshtein`: Computes edit distance allowing insertions/deletions (for indel-tolerant matching).
3. **Tag Extraction Strategies**
- `lookForTag`: Extracts delimited tags (e.g., between two identical delimiters).
- `lookForRescueTag`: Robustly extracts tags despite indels or variable delimiter lengths.
- `*Fixed/Delimited/RescueTagExtractor` methods: Support three tag formats per primer direction (fixed-length, delimited with exact delimiters, or rescue-tolerant).
4. **Marker & Library Abstraction**
- `NGSLibrary`: Holds a map of primer pairs (`PrimerPair`) to `Marker` objects.
- Each `Marker`: Defines forward/reverse primer sequences, tag specifications (length/spacer/delimiter/indels), and sample-to-tag mappings.
5. **Tag Assignment & Sample Identification**
- `TagExtractor`: Extracts forward/reverse tags from primer-flanked regions and annotates them.
- `SampleIdentifier`: Matches extracted tags to known samples using configurable matching modes:
- `"strict"`: Exact match only.
- `"hamming"`: Closest tag by Hamming distance (substitutions).
- `"indel"`: Closest tag by Levenshtein distance.
- Annotates results with matching mode, distances, and proposed tags.
6. **Multi-Barcode Extraction**
- `ExtractMultiBarcode`: Scans a full sequence for primer pairs (forward/reverse + their complements), detects valid amplicon intervals, and:
- Extracts the internal barcode region.
- Assigns tags → sample via `SampleIdentifier`.
- Annotates each barcode with primer matches, errors, directionality.
- Handles both orientations (`forward` and `reverse`) of the amplicon.
7. **Parallel Processing Integration**
- `ExtractMultiBarcodeSliceWorker`: Returns a reusable worker function for batch processing sequences, supporting options like indel tolerance and mismatch limits.
## Use Case
This package enables **demultiplexing** of NGS reads in amplicon-based workflows (e.g., metabarcoding), where samples are labeled with unique dual barcodes. It ensures robustness against sequencing errors and supports flexible tag design.
@@ -0,0 +1,17 @@
# Semantic Description of `obingslibrary` Package
The `obingslibrary` package defines core data structures and methods for managing **PCR-based NGS library designs**, particularly in metabarcoding workflows.
- `PrimerPair` and `TagPair`: Represent forward/reverse primer or tag sequences.
- `PCR`: Encapsulates a single PCR amplification experiment with sample metadata and annotations (via `obiseq.Annotation`).
- `NGSLibrary`: Central struct storing primer definitions (`Primers`) and associated marker specifications (`Markers`), where each `Marker` defines how primers (and attached tags) are processed.
Key functionality includes:
- **Dynamic marker creation**: `GetMarker()` lazily initializes a new `Marker` for any primer pair if not already present.
- **Compilation**: Two compilation modes (`Compile`, `Compile2`) prepare internal search structures (e.g., error-tolerant index) using user-defined parameters like max errors and indel allowance.
- **Tag configuration**: Methods to set spacer length, delimiter character (e.g., `N` or `X`), and indel tolerance for tags—globally (`SetTagSpacer`, etc.) or per-primer.
- **Matching strategy**: Configure alignment behavior (e.g., `"strict"` vs. `"fuzzy"`) via `SetMatching*`.
- **Unicity & validation**: `CheckPrimerUnicity()` ensures no primer is reused across multiple markers and prevents self-complementary pairs.
- **Error handling**: Supports configurable mismatch/indel budgets per primer direction.
This library enables flexible, reproducible specification of molecular identifiers (tags) and amplification primers—essential for accurate demultiplexing and sequence assignment in high-throughput sequencing pipelines.
+31
View File
@@ -0,0 +1,31 @@
# PCR Simulation and Barcode Extraction Module
This Go package (`obingslibrary`) provides configuration-driven tools for **PCR simulation and barcode extraction** from NGS libraries.
## Core Concepts
- `Options`: A fluent configuration object for customizing behavior via functional setters.
- Default options are defined in `MakeOptions`, supporting:
- Error handling (`discardErrors`)
- Mismatch/indel tolerance via `allowedMismatch` and `allowsIndel`
- Parallelization (`parallelWorkers`) and batching control (`batchSize`)
- Optional progress tracking (`withProgressBar`)
## Key Functionalities
- **Barcode Extraction**:
- `ExtractBarcodeSlice`: Extracts barcodes from a slice of sequences using the NGS library, applying configured error handling and alignment parameters.
- `ExtractBarcodeSliceWorker`: Returns a reusable worker function for batch processing (e.g., in pipelines or parallel workers).
- **Compilation Step**:
- `ngslibrary.Compile(...)` prepares internal indexing based on mismatch/indel settings before extraction.
- **Error Handling**:
- If `discardErrors` is true (default), sequences causing extraction errors are filtered out.
- Alternatively, error-containing reads can be retained or logged via `OptionUnidentified`.
## Design Highlights
- Uses the *option pattern* for extensibility and clean API.
- Integrates with default settings from `obidefault` (e.g., parallelism, batch size).
- Designed for both direct use and integration into concurrent workflows.