Files
obitools4/autodoc/docmd/pkg/obiapat/pattern.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

26 lines
2.0 KiB
Markdown

# Apat Package: Pattern Matching for Biological Sequences
The `obiapat` Go package provides high-performance pattern matching over biological sequences using the **Apat algorithm**, a C-based implementation wrapped in Go. It supports fuzzy matching (with mismatches and indels), reverse-complement patterns, memory-safe resource management via finalizers, and efficient filtering of non-overlapping matches.
## Core Types
- `ApatPattern`: Represents a compiled pattern (up to 64 bp), supporting IUPAC ambiguity codes (`W`, `[AT]`), negated bases (`!A`), and fixed positions (`#`).
- `ApatSequence`: Wraps a biological sequence (from `obiseq.BioSequence`) for fast matching, with optional circular topology support and memory recycling.
## Key Functions & Methods
- `MakeApatPattern(pattern string, errormax int, allowsIndel bool)`: Compiles a pattern with max error tolerance and optional indels.
- `ReverseComplement()`: Returns the reverse-complemented pattern (useful for DNA strand symmetry).
- `FindAllIndex(...)`: Returns all matches as `[start, end, errors]`, supporting partial sequence searches.
- `IsMatching(...)`: Boolean check for presence of at least one match in a region.
- `BestMatch(...)`: Finds the *best* (lowest-error) match, with local realignment for indel-containing patterns.
- `FilterBestMatch(...)`: Returns *non-overlapping* matches, prioritizing lower-error occurrences.
- `AllMatches(...)`: Filters and refines all valid matches (including indel-aware alignment).
- `Free()`, `Len()`: Explicit memory cleanup and length queries.
## Implementation Notes
Internally, the package uses `cgo` to interface with C structures (`Pattern`, `Seq`) allocated via custom memory management. Finalizers ensure safe deallocation, while unsafe pointer arithmetic avoids data copying during search (e.g., `unsafe.SliceData`). Logging is integrated via Logrus.
This package enables scalable, low-level pattern mining in NGS data preprocessing pipelines (e.g., primer detection, adapter trimming).