mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
K-Way Merge Functionality in obikmer
This Go package provides utilities for merging sorted k-mer streams stored in .kdi files. Its core component is the KWayMerge, which performs a k-way merge of multiple sorted input streams, aggregating duplicate k-mers by counting their occurrences.
Key Features
- Sorted K-Mer Input: Reads k-mers from
.kdifiles viaKdiReader, assuming each file contains sorted 64-bit unsigned integers (uint64). - K-Way Merge: Merges multiple sorted streams into a single globally sorted stream using an efficient priority queue (min-heap) internally.
- Count Aggregation: When identical k-mers appear across multiple streams, the merge counts how many times each unique k-mer occurs.
- Memory-Efficient Streaming: Processes data incrementally, avoiding full loading of all streams into memory.
- Robust Test Coverage: Includes unit tests for:
- Basic merging with overlapping and non-overlapping values.
- Single-stream input (degenerate case).
- Empty streams handling.
- All identical k-mers across inputs.
API Highlights
NewKdiReader(path)— opens a.kdifile for reading.writeKdi(...)(test helper) — writes sorted k-mers to a.kdifile.NewKWayMerge([]*KdiReader)— constructs the merger from multiple readers..Next()→(kmer uint64, count int, ok bool)— yields next merged k-mer and its frequency;ok=falsesignals end-of-stream..Close()— cleans up resources.
Use Case
Ideal for aggregating k-mer counts across multiple sequencing samples (e.g., in bioinformatics), where each sample’s k-mers are pre-sorted and persisted, enabling scalable distributed counting without full in-memory deduplication.