Files
obitools4/autodoc/docmd/pkg/obikmer/skm_test.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

24 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SKM File Format Specification
This Go package implements a binary format for storing *super-kmers*—compact representations of DNA sequences used in bioinformatics. The tests validate reading/writing, padding behavior, and file size correctness.
## Core Functionalities
- **SuperKmer Structure**: Each super-kmer stores a DNA sequence (as bytes), likely padded to 4-base boundaries for efficient storage.
- **SkmWriter**: Serializes super-kmers into a binary file. Each entry writes:
- A 2-byte little-endian length (number of bases),
- Then `ceil(length/4)` bytes encoding nucleotides in 2 bits each (A=0, C=1, G=2, T=3).
- **SkmReader**: Parses the binary format back into memory. Returns `(SuperKmer, bool)` via `Next()`, with EOF signaled by `ok = false`.
- **Case Handling**: Writes preserve original case; reads normalize to lowercase (via `| 0x20` in tests), ensuring robust comparison.
## Test Coverage
- **Round-trip integrity**: Verifies exact sequence recovery after write/read.
- **Empty file handling**: Confirms reader returns `ok = false` immediately on empty files.
- **Variable-length padding**: Validates correct encoding/decoding for sequences of length 15.
- **Size validation**: Confirms file size = `2 + ceil(L/4)` bytes for a sequence of length *L*.
## Use Case
Efficient, lossless storage and retrieval of super-kmers for downstream genomic analysis (e.g., assembly or alignment acceleration).