Files
obitools4/autodoc/docmd/pkg/obikmer/superkmer_iter.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.7 KiB

Super K-mers Extraction Module (obikmer)

This Go package provides efficient tools for extracting super k-mers from DNA sequences using minimizer-based sliding windows. Super k-mers are maximal contiguous subsequences sharing the same minimal canonical minimizer in a window of size k.

Core Functionality

  • IterSuperKmers(seq, k, m)
    Returns an iterator over SuperKmer structs. Each struct contains:

    • Start, End: genomic positions of the super k-mer in the original sequence
    • Minimizer: canonical minimizer value (uint64) for that segment
    • Sequence: the actual DNA subsequence
  • SuperKmer.ToBioSequence(...)
    Converts a raw SuperKmer into an enriched obiseq.BioSequence, embedding metadata:

    • ID: {parentID}_superkmer_{start}_{end}
    • Attributes: minimizer sequence (minimizer_seq), value, k, m, positions, and parent ID
  • SuperKmerWorker(k, m)
    A SeqWorker adapter for pipeline integration (e.g., with obiiter). Processes a full BioSequence and returns all extracted super k-mers as a slice of BioSequences.

Algorithm Highlights

  • Uses canonical minimizers (forward/reverse-complement minimum) to ensure strand-invariance
  • Maintains a monotonic deque for efficient sliding-window minimizer tracking (O(n) time complexity)
  • Supports DNA bases A/C/G/T/U case-insensitively via bitmasking (seq[i] & 31)
  • Enforces parameter constraints: 1 ≤ m < k ≤ 31, sequence length ≥ k

Use Cases

  • Read partitioning in metagenomics (e.g., for error correction or clustering)
  • Efficient k-mer space segmentation without storing all individual kmers
  • Integration into modular bioinformatics pipelines via SeqWorker interface