mirror of https://github.com/metabarcoding/obitools4.git synced 2026-04-30 12:00:39 +00:00

Files

T

Eric Coissac 8c7017a99d ⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)

2026-04-13 13:34:53 +02:00

2.1 KiB

Raw Blame History

Semantic Description of `obialign` Package

The obialign package provides high-performance functions for computing the Longest Common Subsequence (LCS) between two biological sequences, with support for error tolerance and end-gap-free alignment.

Core Algorithm

Implements a Needleman-Wunsch dynamic programming algorithm optimized for speed and memory efficiency.
Uses bit-packed encoding (uint64) to store score, path length, and gap status in a compact form.
Leverages diagonal banding to restrict computation only within the allowed error margin, reducing time and space complexity.

Scoring Scheme

Match: +1 point
Mismatch or gap (indel): 0 points

Key Functions

FastLCSEGFScoreByte(bA, bB []byte, maxError int, endgapfree bool, buffer *[]uint64) (int, int, int)
- Computes LCS score and alignment length between raw byte sequences.
- If endgapfree is true, ignores leading/trailing gaps (useful for read alignment).
- Returns (score, length, end_position); end_position marks where the LCS ends in sequence A.
- Returns -1, -1, -1 if the actual error count exceeds maxError.
FastLCSEGFScore(seqA, seqB *obiseq.BioSequence, maxError int, buffer ...)
- Wrapper for FastLCSEGFScoreByte with end-gap-free mode enabled by default.
- Designed for standard biosequence inputs.
FastLCSScore(seqA, seqB *obiseq.BioSequence, maxError int, buffer ...)
- Computes standard LCS (including end gaps). Returns (score, alignment_length).

Features

Error-bounded: Supports maxError = -1 (unlimited) or a fixed max number of mismatches + gaps.
Memory-efficient: Reuses user-provided or auto-created buffers to avoid allocations during repeated calls.
IUPAC-aware: Uses obiseq.SameIUPACNuc() to handle ambiguous nucleotide codes (e.g., R, Y).
Optimized for short reads: Particularly suited to high-throughput sequencing data alignment tasks (e.g., in OBITools4).

Use Cases

Molecular barcode/UMI clustering
Read-to-reference alignment in amplicon sequencing
Similarity filtering of biological sequences

2.1 KiB Raw Blame History

Semantic Description of obialign Package