Files
obitools4/autodoc/docmd/pkg/obikmer/kdi_merge.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

38 lines
1.5 KiB
Markdown

# K-Way Merge for Sorted k-mer Streams
This Go package implements a **k-way merge** over multiple sorted streams of *k*-mer values (`uint64`). It leverages a **min-heap** to efficiently produce the globally sorted sequence while aggregating duplicate counts across input streams.
## Core Components
- **`mergeItem`**: Stores a value and its source reader index for heap operations.
- **`mergeHeap`** & `heap.Interface`: Implements a min-heap for efficient retrieval of smallest values.
- **`KWayMerge`**: Main struct managing the heap and input readers.
## Key Functionality
- **Initialization (`NewKWayMerge`)**:
- Takes a slice of `*KdiReader`, each expected to yield sorted values.
- Preloads the heap with one value from each reader.
- **Streaming Output (`Next`)**:
- Returns the next smallest *k*-mer, its frequency across readers (i.e., how many input streams contained it), and a success flag.
- Handles duplicates: pops *all* items equal to the current minimum before advancing readers.
- **Cleanup (`Close`)**:
- Closes all underlying `KdiReader`s and returns the first encountered error.
## Use Case
Ideal for merging sorted *k*-mer databases (e.g., from multiple files or processes), enabling:
- Efficient deduplication with multiplicity tracking.
- Scalable union/intersection operations on large *k*-mer sets.
## Complexity
| Operation | Time |
|-----------|------------|
| `Next()` | *O(log k)* (heap ops per unique value) |
| Init | *O(k)* |
Where `k` = number of input readers.