Add error handling for ambiguous bases in k-mer encoding

This commit introduces error handling for ambiguous DNA bases (N, R, Y, W, S, K, M, B, D, H, V) in k-mer encoding. It adds new functions IterNormalizedKmersWithErrors and EncodeNormalizedKmersWithErrors that track and encode the number of ambiguous bases in each k-mer using error markers in the top 2 bits. The commit also updates the version string to reflect the latest changes.
This commit is contained in:
Eric Coissac
2026-02-04 21:44:52 +01:00
parent 28162ac36f
commit 60f27c1dc8
3 changed files with 290 additions and 1 deletions

View File

@@ -5,6 +5,10 @@ import (
"git.metabarcoding.org/obitools/obitools4/obitools4/pkg/obiutils"
)
// __single_base_code__ encodes DNA bases to 2-bit values.
// Standard bases: A=0, C=1, G=2, T/U=3
// Ambiguous bases (N, R, Y, W, S, K, M, B, D, H, V): 0xFF (255) to signal error
var __single_base_code__ = []byte{0,
// A, B, C, D,
0, 0, 1, 0,