mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
2.1 KiB
2.1 KiB
Semantic Description of obialign Package
The obialign package provides high-performance functions for computing the Longest Common Subsequence (LCS) between two biological sequences, with support for error tolerance and end-gap-free alignment.
Core Algorithm
- Implements a Needleman-Wunsch dynamic programming algorithm optimized for speed and memory efficiency.
- Uses bit-packed encoding (
uint64) to store score, path length, and gap status in a compact form. - Leverages diagonal banding to restrict computation only within the allowed error margin, reducing time and space complexity.
Scoring Scheme
- Match: +1 point
- Mismatch or gap (indel): 0 points
Key Functions
-
FastLCSEGFScoreByte(bA, bB []byte, maxError int, endgapfree bool, buffer *[]uint64) (int, int, int)- Computes LCS score and alignment length between raw byte sequences.
- If
endgapfreeis true, ignores leading/trailing gaps (useful for read alignment). - Returns
(score, length, end_position);end_positionmarks where the LCS ends in sequence A. - Returns
-1, -1, -1if the actual error count exceedsmaxError.
-
FastLCSEGFScore(seqA, seqB *obiseq.BioSequence, maxError int, buffer ...)- Wrapper for
FastLCSEGFScoreBytewith end-gap-free mode enabled by default. - Designed for standard biosequence inputs.
- Wrapper for
-
FastLCSScore(seqA, seqB *obiseq.BioSequence, maxError int, buffer ...)- Computes standard LCS (including end gaps). Returns
(score, alignment_length).
- Computes standard LCS (including end gaps). Returns
Features
- Error-bounded: Supports
maxError = -1(unlimited) or a fixed max number of mismatches + gaps. - Memory-efficient: Reuses user-provided or auto-created buffers to avoid allocations during repeated calls.
- IUPAC-aware: Uses
obiseq.SameIUPACNuc()to handle ambiguous nucleotide codes (e.g.,R,Y). - Optimized for short reads: Particularly suited to high-throughput sequencing data alignment tasks (e.g., in OBITools4).
Use Cases
- Molecular barcode/UMI clustering
- Read-to-reference alignment in amplicon sequencing
- Similarity filtering of biological sequences