⬆️ version bump to v4.5

- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
This commit is contained in:
Eric Coissac
2026-04-07 08:36:50 +02:00
parent 670edc1958
commit 8c7017a99d
392 changed files with 18875 additions and 141 deletions
+23
View File
@@ -0,0 +1,23 @@
# SKM File Format Specification
This Go package implements a binary format for storing *super-kmers*—compact representations of DNA sequences used in bioinformatics. The tests validate reading/writing, padding behavior, and file size correctness.
## Core Functionalities
- **SuperKmer Structure**: Each super-kmer stores a DNA sequence (as bytes), likely padded to 4-base boundaries for efficient storage.
- **SkmWriter**: Serializes super-kmers into a binary file. Each entry writes:
- A 2-byte little-endian length (number of bases),
- Then `ceil(length/4)` bytes encoding nucleotides in 2 bits each (A=0, C=1, G=2, T=3).
- **SkmReader**: Parses the binary format back into memory. Returns `(SuperKmer, bool)` via `Next()`, with EOF signaled by `ok = false`.
- **Case Handling**: Writes preserve original case; reads normalize to lowercase (via `| 0x20` in tests), ensuring robust comparison.
## Test Coverage
- **Round-trip integrity**: Verifies exact sequence recovery after write/read.
- **Empty file handling**: Confirms reader returns `ok = false` immediately on empty files.
- **Variable-length padding**: Validates correct encoding/decoding for sequences of length 15.
- **Size validation**: Confirms file size = `2 + ceil(L/4)` bytes for a sequence of length *L*.
## Use Case
Efficient, lossless storage and retrieval of super-kmers for downstream genomic analysis (e.g., assembly or alignment acceleration).