mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.7 KiB
1.7 KiB
KDI Reader: Streaming Delta-Varint Decoding for k-mers
The obikmer package provides a high-performance, streaming reader for .kdi files—binary containers storing sorted k-mers (typically DNA substrings encoded as 64-bit integers). It supports both sequential and indexed access.
Core Features
- Streaming decoding: K-mers are read incrementally using delta-varint compression to minimize I/O and memory footprint.
- Delta encoding: After the first absolute
uint64, subsequent values are stored as deltas (difference from previous), encoded via customDecodeVarint. - Magic & format validation: A 4-byte magic header ensures file integrity; Little Endian
uint64stores total count. - Sparse indexing: When paired with a
.kdxindex,SeekTo(target)enables fast forward-only jumps to positions ≥ target k-mer. - Graceful fallback: If
.kdxis missing or invalid, the reader automatically degrades to sequential mode.
Key API
NewKdiReader(path)→ opens.kdifor streaming (no index).NewKdiIndexedReader(path)→ opens with optional.kdxfor random access.Next()→ returns(nextKmer, true)or(0, false)when exhausted.SeekTo(target uint64) error→ jumps to first k-mer ≥ target using index (no backward seek).Count()/Remaining()→ total and unread k-mers.Close()→ releases file handle.
Design Highlights
- Uses 64 KB buffer for efficient I/O.
- Index entries record
(kmer, byteOffset)at fixed strides (e.g., every 1024 k-mers). SeekTois idempotent and safe: no-op if target ≤ current position or index unavailable.- Designed for large-scale genomic k-mer catalogs (e.g., from minimizers or de Bruijn graphs).