Files
obitools4/autodoc/docmd/pkg/obikmer/skm_test.md
T
Eric Coissac 8c7017a99d ⬆️ version bump to v4.5
- Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5"
- Update version.txt from 4.29 → .30
(automated by Makefile)
2026-04-13 13:34:53 +02:00

1.4 KiB
Raw Blame History

SKM File Format Specification

This Go package implements a binary format for storing super-kmers—compact representations of DNA sequences used in bioinformatics. The tests validate reading/writing, padding behavior, and file size correctness.

Core Functionalities

  • SuperKmer Structure: Each super-kmer stores a DNA sequence (as bytes), likely padded to 4-base boundaries for efficient storage.
  • SkmWriter: Serializes super-kmers into a binary file. Each entry writes:
    • A 2-byte little-endian length (number of bases),
    • Then ceil(length/4) bytes encoding nucleotides in 2 bits each (A=0, C=1, G=2, T=3).
  • SkmReader: Parses the binary format back into memory. Returns (SuperKmer, bool) via Next(), with EOF signaled by ok = false.
  • Case Handling: Writes preserve original case; reads normalize to lowercase (via | 0x20 in tests), ensuring robust comparison.

Test Coverage

  • Round-trip integrity: Verifies exact sequence recovery after write/read.
  • Empty file handling: Confirms reader returns ok = false immediately on empty files.
  • Variable-length padding: Validates correct encoding/decoding for sequences of length 15.
  • Size validation: Confirms file size = 2 + ceil(L/4) bytes for a sequence of length L.

Use Case

Efficient, lossless storage and retrieval of super-kmers for downstream genomic analysis (e.g., assembly or alignment acceleration).