mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
Super K-mers Extraction Module (obikmer)
This Go package provides efficient tools for extracting super k-mers from DNA sequences using minimizer-based sliding windows. Super k-mers are maximal contiguous subsequences sharing the same minimal canonical minimizer in a window of size k.
Core Functionality
-
IterSuperKmers(seq, k, m)
Returns an iterator overSuperKmerstructs. Each struct contains:Start,End: genomic positions of the super k-mer in the original sequenceMinimizer: canonical minimizer value (uint64) for that segmentSequence: the actual DNA subsequence
-
SuperKmer.ToBioSequence(...)
Converts a rawSuperKmerinto an enrichedobiseq.BioSequence, embedding metadata:- ID:
{parentID}_superkmer_{start}_{end} - Attributes: minimizer sequence (
minimizer_seq), value,k,m, positions, and parent ID
- ID:
-
SuperKmerWorker(k, m)
ASeqWorkeradapter for pipeline integration (e.g., withobiiter). Processes a full BioSequence and returns all extracted super k-mers as a slice ofBioSequences.
Algorithm Highlights
- Uses canonical minimizers (forward/reverse-complement minimum) to ensure strand-invariance
- Maintains a monotonic deque for efficient sliding-window minimizer tracking (O(n) time complexity)
- Supports DNA bases
A/C/G/T/Ucase-insensitively via bitmasking (seq[i] & 31) - Enforces parameter constraints:
1 ≤ m < k ≤ 31, sequence length ≥k
Use Cases
- Read partitioning in metagenomics (e.g., for error correction or clustering)
- Efficient k-mer space segmentation without storing all individual kmers
- Integration into modular bioinformatics pipelines via
SeqWorkerinterface