Files
obitools4/autodoc/docmd/pkg/obikmer/kmer_match.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.5 KiB
Raw Blame History

Semantic Description of obikmer Package

The obikmer package implements efficient k-mer matching between query sequences and an indexed reference using canonical k-mers partitioned by minimizer-based hashing.

  • QueryEntry represents a single canonical kmer with its origin: sequence index and 1-based position.
  • PreparedQueries groups queries into sorted buckets per partition, enabling batched and parallelized matching.
  • PrepareQueries scans input sequences using super-kmers (with window size m) to compute minimizers, assigns each kmer to a partition via modulo hashing, and sorts buckets by kmer value.
  • MergeQueries combines two sets of prepared queries across batches using a merge-sort strategy, correctly offsetting sequence indices to preserve global ordering.
  • MatchBatch performs parallel matching per partition: each goroutine runs a merge-scan between sorted queries and the corresponding KDI (K-mer Disk Index) stream.
    • Efficient seeking is used only when beneficial, avoiding costly syscalls for small skips.
    • Matches are recorded with thread-safe per-sequence mutexes; final positions within each sequence are sorted post-match.
  • matchPartition implements the core merge-scan: it opens a KDI reader, seeks to relevant regions of the index, and walks both query list and kmer stream in lockstep.

The design supports large-scale batch processing, incremental query accumulation, and high-performance parallel lookup—ideal for metagenomic or biodiversity sequencing workflows.