Refactor k-mer encoding and frequency filtering with KmerSet

This commit refactors the k-mer encoding logic to handle ambiguous bases more consistently and introduces a KmerSet type for better management of k-mer collections. The frequency filter now works with KmerSet instead of roaring bitmaps directly, and the API has been updated to support level-based frequency queries. Additionally, the commit updates the version and commit hash.
This commit is contained in:
Eric Coissac
2026-02-05 14:41:41 +01:00
parent 60f27c1dc8
commit 00dcd78e84
5 changed files with 191 additions and 271 deletions

View File

@@ -7,7 +7,8 @@ import (
// __single_base_code__ encodes DNA bases to 2-bit values.
// Standard bases: A=0, C=1, G=2, T/U=3
// Ambiguous bases (N, R, Y, W, S, K, M, B, D, H, V): 0xFF (255) to signal error
// Ambiguous bases (N, R, Y, W, S, K, M, B, D, H, V) and other characters: encoded as 0 (A)
// Note: For error detection with ambiguous bases, use __single_base_code_err__ in encodekmer.go
var __single_base_code__ = []byte{0,
// A, B, C, D,