Files
obitools4/autodoc/docmd/pkg_obingslibrary.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

3.9 KiB
Raw Blame History

obingslibrary: High-Throughput Sequencing Demultiplexing Library

obingslibrary is a Go package for sample assignment in amplicon-based NGS workflows, using dual-indexed barcodes flanked by PCR primers. It enables robust, configurable demultiplexing of sequencing reads—even in the presence of errors or indels—by matching primertag patterns and assigning samples via tag lookup.


Core Functionalities

1. Primer & Tag Configuration

  • Marker: Defines a primer pair (forward/reverse), including:
    • Primer sequences (Forward, Reverse) and reverse-complement variants.
    • Tag specifications: lengths, spacers (e.g., N or fixed nucleotides), delimiters.
    • Mismatch/indel tolerance per direction (SetAllowedMismatch, SetTagIndels).
  • Compilation:
    • Compile() / Compile2(): Builds internal pattern indexes (via obiapat.ApatPattern) for fast, error-tolerant matching.
    • Supports "strict", "hamming" (substitutions only), or "indel" (Levenshtein) matching modes.

2. Sequence Matching & Demultiplexing

  • Match(sequence): Scans a BioSequence for valid primer bindings:
    • Prioritizes forward-primer detection; falls back to reverse orientation.
    • Returns DemultiplexMatch with:
      • Primer positions, mismatches, orientation (IsDirect).
      • Barcode coordinates (BarcodeStart, BarcodeEnd) and validity flag.
  • Primer dimer detection: If BarcodeStart > BarcodeEnd, the read is flagged as invalid.

3. Tag Extraction & Annotation

  • ExtractBarcode(sequence, inplace):
    • Extracts the barcode region between forward/reverse primers.
    • Reverse-complements if read is in reverse orientation (IsDirect == false).
    • Annotates the sequence with:
      • Primer names, positions, mismatches.
      • Sample/experiment info (if tag assignment succeeds).
      • Error messages (Unassigned, NoMatch, etc.).
  • Tag extraction strategies:
    • Fixed: Fixed-length tags.
    • Delimited: Tags flanked by exact delimiters (e.g., "NN").
    • Rescue: Tolerates indels in delimiter or tag boundaries.

4. Sample Registration & Lookup

  • GetPCR(tagPair): Retrieves or registers a new PCR reaction indexed by tag pair (case-insensitive).
  • NGSLibrary.Markers: Map of primer pairs → Marker objects.
    • Lazy initialization via GetMarker() for new primers.

5. Validation & Consistency Checks

  • CheckTagLength(): Ensures all registered tags have uniform length per direction.
  • CheckPrimerUnicity(): Validates no primer is reused across markers; prevents self-complementary pairs.

6. Batch Processing & Parallelism

  • ExtractBarcodeSlice(sequences, options): Processes a slice of reads.
    • Configurable via Options (fluent API):
      • Mismatch/indel budgets.
      • Error handling (discardErrors, OptionUnidentified).
      • Parallel workers, batch size.
  • ExtractBarcodeSliceWorker(): Returns a reusable worker for concurrent pipelines.

7. Distance Metrics

  • Hamming(s1, s2): Counts mismatches between equal-length strings.
  • Levenshtein(s1, s2): Computes edit distance (supports indels).

8. Sample Identification

  • TagExtractor: Extracts forward/reverse tags from primer-flanked regions.
  • SampleIdentifier:
    • Matches extracted tags to known samples using configured strategy (strict, hamming, or indel).
    • Returns best-matching sample, distance, and proposed tags.

Design Highlights

  • Memory-efficient: Uses reference-counted sequences (Recycle()).
  • Error-aware: Rich error propagation (stored in DemultiplexMatch.Error or annotations).
  • Flexible tag design: Supports fixed, delimited (exact), and indel-resilient tags.
  • Extensible via options: Functional setters for clean, testable configuration.

Use Case

Ideal for metabarcoding or targeted amplicon sequencing, where samples are multiplexed using unique dual barcodes. Ensures high specificity (unique tag pairs) and sensitivity (error-tolerant matching).