Files
obitools4/autodoc/docmd/pkg/obikmer/skm_reader.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

25 lines
1.2 KiB
Markdown

# SKM File Reader for Super-Kmers
This Go package provides a binary file reader (`SkmReader`) for `.skm` files, which store *super-kmers* — compact representations of DNA sequences using 2-bit encoding.
## Core Functionality
- **Binary Format Parsing**: Reads structured data from `.skm` files, where each record contains:
- A 2-byte little-endian integer specifying the sequence length.
- Packed nucleotide data, where every byte encodes up to four bases (2 bits per base).
- **Decoding Logic**: Converts packed 2-bit codes (`00`, `01`, `10`, `11`) to nucleotide characters using the mapping:
`{ 'a', 'c', 'g', 't' }`.
- **Memory-Efficient Reading**: Uses buffered I/O (64 KiB buffer) for fast sequential access.
- **Streaming Interface**: `Next()` returns the next super-kmer as a struct with:
- `Sequence`: decoded nucleotide byte slice.
- `Start`, `End`: positional metadata (currently fixed to full length).
- **Resource Management**: Provides a clean `.Close()` method for file handle cleanup.
## Use Case
Designed for high-performance processing of large genomic datasets (e.g., in k-mer analysis or sequence indexing), where storage size and read speed are critical.