autodoc/docmd/pkg/obikmer/encodekmer.md

# Semantic Description of `obikmer` Package

The `obikmer` package provides high-performance, zero-allocation utilities for **k-mer manipulation** in DNA sequences (A/C/G/T/U), targeting bioinformatics applications like genome indexing, assembly, and error correction.

## Core Encoding & Decoding

- **`EncodeKmer`, `DecodeKmer`**: Convert between DNA sequences and compact 62-bit uint64 representations (2 bits/base), preserving top 2 bits for optional error markers.
- **`EncodeCanonicalKmer`, `CanonicalKmer`**: Encode or normalize k-mers to their *biological canonical form* — the lexicographically smaller of a k-mer and its reverse complement.

## Iterators (Memory-Efficient Streaming)

- **`IterKmers`, `IterCanonicalKmers`**: Stream all overlapping k-mers from a sequence without allocating intermediate slices — ideal for large-scale processing (e.g., inserting into Roaring Bitmaps).
- **`IterCanonicalKmersWithErrors`**: Same as above, but detects ambiguous bases (N/R/Y/W/S/K/M/B/D/H/V) and encodes their count in the top 2 bits (error code: 0–3). Only valid for **odd k ≤ 31**.

## Error Handling & Markers

- `SetKmerError`, `GetKmerError`, and `ClearKmerError` manipulate the top 2 bits of a uint64 to store error metadata (e.g., ambiguous base count), enabling downstream filtering or correction.

## Reverse Complement & Circular Normalization

- **`ReverseComplement`, `CanonicalKmer`**: Compute biological reverse complement and canonical form.
- **`NormalizeCircular`, `EncodeCircularCanonicalKmer`**: Compute *circular canonical form* — the lexicographically smallest rotation (used for low-complexity masking).
- Distinction: `CanonicalKmer` uses **reverse complement**, while `NormalizeCircular` uses **rotation**.

## Counting & Math Utilities

- **`CanonicalCircularKmerCount`, `necklaceCount`, etc.**: Compute exact counts of unique circular k-mer equivalence classes using **Moreau’s necklace formula**, with Euler's totient function and divisor enumeration.

## Performance & Safety

- All functions avoid heap allocations where possible (reusing buffers).
- Panics on invalid `k` or length mismatches for correctness.
- Supports case-insensitive input (A/a, T/t…), and ambiguous bases via `__single_base_code_err__`.

## Use Cases

- K-mer counting in assemblers (e.g., with Bloom filters or bitmaps)
- Error-aware k-mer filtering in sequencing pipelines
- Low-complexity region detection via circular entropy normalization
-											⬆️ version bump to v4.5
										
										
											2026-04-07 08:36:50 +02:00
+								# Semantic Description of `obikmer` Package
 								The `obikmer` package provides high-performance, zero-allocation utilities for **k-mer manipulation** in DNA sequences (A/C/G/T/U), targeting bioinformatics applications like genome indexing, assembly, and error correction.
 								## Core Encoding & Decoding
 								- **`EncodeKmer`, `DecodeKmer`**: Convert between DNA sequences and compact 62-bit uint64 representations (2 bits/base), preserving top 2 bits for optional error markers.
 								- **`EncodeCanonicalKmer`, `CanonicalKmer`**: Encode or normalize k-mers to their *biological canonical form* — the lexicographically smaller of a k-mer and its reverse complement.
 								## Iterators (Memory-Efficient Streaming)
 								- **`IterKmers`, `IterCanonicalKmers`**: Stream all overlapping k-mers from a sequence without allocating intermediate slices — ideal for large-scale processing (e.g., inserting into Roaring Bitmaps).
 								- **`IterCanonicalKmersWithErrors`**: Same as above, but detects ambiguous bases (N/R/Y/W/S/K/M/B/D/H/V) and encodes their count in the top 2 bits (error code: 0–3). Only valid for **odd k ≤ 31**.
 								## Error Handling & Markers
 								- `SetKmerError`, `GetKmerError`, and `ClearKmerError` manipulate the top 2 bits of a uint64 to store error metadata (e.g., ambiguous base count), enabling downstream filtering or correction.
 								## Reverse Complement & Circular Normalization
 								- **`ReverseComplement`, `CanonicalKmer`**: Compute biological reverse complement and canonical form.
 								- **`NormalizeCircular`, `EncodeCircularCanonicalKmer`**: Compute *circular canonical form* — the lexicographically smallest rotation (used for low-complexity masking).
 								- Distinction: `CanonicalKmer` uses **reverse complement**, while `NormalizeCircular` uses **rotation**.
 								## Counting & Math Utilities
 								- **`CanonicalCircularKmerCount`, `necklaceCount`, etc.**: Compute exact counts of unique circular k-mer equivalence classes using **Moreau’s necklace formula**, with Euler's totient function and divisor enumeration.
 								## Performance & Safety
 								- All functions avoid heap allocations where possible (reusing buffers).
 								- Panics on invalid `k` or length mismatches for correctness.
 								- Supports case-insensitive input (A/a, T/t…), and ambiguous bases via `__single_base_code_err__`.
 								## Use Cases
 								- K-mer counting in assemblers (e.g., with Bloom filters or bitmaps)
 								- Error-aware k-mer filtering in sequencing pipelines
 								- Low-complexity region detection via circular entropy normalization