mirror of
https://github.com/metabarcoding/obitools4.git
synced 2026-04-30 12:00:39 +00:00
8c7017a99d
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)
1.6 KiB
1.6 KiB
KDX Index Format and Functionality
The obikmer package provides a sparse indexing mechanism for .kdi files (likely storing sorted k-mers with delta encoding). The .kdx file serves as a fast lookup table to accelerate k-mer searches.
Core Concepts
- Magic bytes:
KDX\x01validates file integrity. - Stride-based sparsity: One index entry every N k-mers (default: 4096), balancing memory vs. search speed.
- Entry structure: Each entry stores:
kmer: the k-mer value at that index position.offset: absolute byte offset in the corresponding.kdifile.
Key Operations
- Loading:
LoadKdxIndex()reads and validates a.kdxfile; returns(nil, nil)if missing (graceful degradation). - Searching:
FindOffset(target uint64)performs binary search over index entries to find the best jump point:- Returns
offset,skipCount(k-mer count already passed), and a boolean success flag. - Enables efficient seeking: after
offset, only up to stride k-mers need linear scanning.
- Returns
- Writing:
WriteKdxIndex()serializes an in-memory index to disk (for building indexes). - Helper:
KdxPathForKdi()derives the.kdxpath from a given.kdifile.
Performance
- Search complexity: O(log M) for the binary search (where M = #index entries), plus ≤ stride linear steps.
- Memory footprint: Linear in index size (16 bytes per entry), highly scalable for large k-mer sets.
Design Philosophy
Minimalist, binary-safe format with explicit endianness (little-endian), no external dependencies beyond encoding/binary, and robust error handling.