mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.5 KiB
1.5 KiB
Semantic Description of obikmer Package
The obikmer package implements efficient k-mer matching between query sequences and an indexed reference using canonical k-mers partitioned by minimizer-based hashing.
QueryEntryrepresents a single canonical k‑mer with its origin: sequence index and 1-based position.PreparedQueriesgroups queries into sorted buckets per partition, enabling batched and parallelized matching.PrepareQueriesscans input sequences using super-kmers (with window sizem) to compute minimizers, assigns each k‑mer to a partition via modulo hashing, and sorts buckets by k‑mer value.MergeQueriescombines two sets of prepared queries across batches using a merge-sort strategy, correctly offsetting sequence indices to preserve global ordering.MatchBatchperforms parallel matching per partition: each goroutine runs a merge-scan between sorted queries and the corresponding KDI (K-mer Disk Index) stream.- Efficient seeking is used only when beneficial, avoiding costly syscalls for small skips.
- Matches are recorded with thread-safe per-sequence mutexes; final positions within each sequence are sorted post-match.
matchPartitionimplements the core merge-scan: it opens a KDI reader, seeks to relevant regions of the index, and walks both query list and k‑mer stream in lockstep.
The design supports large-scale batch processing, incremental query accumulation, and high-performance parallel lookup—ideal for metagenomic or biodiversity sequencing workflows.