mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.5 KiB
1.5 KiB
K-Way Merge for Sorted k-mer Streams
This Go package implements a k-way merge over multiple sorted streams of k-mer values (uint64). It leverages a min-heap to efficiently produce the globally sorted sequence while aggregating duplicate counts across input streams.
Core Components
mergeItem: Stores a value and its source reader index for heap operations.mergeHeap&heap.Interface: Implements a min-heap for efficient retrieval of smallest values.KWayMerge: Main struct managing the heap and input readers.
Key Functionality
-
Initialization (
NewKWayMerge):- Takes a slice of
*KdiReader, each expected to yield sorted values. - Preloads the heap with one value from each reader.
- Takes a slice of
-
Streaming Output (
Next):- Returns the next smallest k-mer, its frequency across readers (i.e., how many input streams contained it), and a success flag.
- Handles duplicates: pops all items equal to the current minimum before advancing readers.
-
Cleanup (
Close):- Closes all underlying
KdiReaders and returns the first encountered error.
- Closes all underlying
Use Case
Ideal for merging sorted k-mer databases (e.g., from multiple files or processes), enabling:
- Efficient deduplication with multiplicity tracking.
- Scalable union/intersection operations on large k-mer sets.
Complexity
| Operation | Time |
|---|---|
Next() |
O(log k) (heap ops per unique value) |
| Init | O(k) |
Where k = number of input readers.