# Semantic Description of `obialign` Package The `obialign` package provides low-level utilities for efficient nucleotide sequence encoding and decoding, specifically designed for bioinformatics alignment tasks. - **Core functionality**: Encodes IUPAC nucleotide symbols (including ambiguous codes like `R`, `Y`, `N`) into compact 4-bit binary representations. - **Binary encoding scheme**: Each bit in a byte corresponds to one canonical nucleotide: A (bit 0), C (bit 1), G (bit 2), T (bit 3). - **Ambiguity support**: Codes like `R` (A/G) set both corresponding bits (`0b0101`). Fully ambiguous `N` sets all four bits (`0b1111`). - **Gap/missing handling**: Symbols `.` and `-`, as well as non-nucleotide characters, map to `0b0000`. - **Memory efficiency**: The encoding avoids allocations via optional buffer reuse. - **Lookup tables**: - `_FourBitsBaseCode`: Maps ASCII nucleotide characters (lowercased via `nuc & 31`) to their binary code. - `_FourBitsBaseDecode`: Inverse mapping for human-readable output (not exported, used internally). - **Integration**: Works with `obiseq.BioSequence`, a generic biological sequence container from the OBITools4 ecosystem. The `Encode4bits` function enables fast, space-efficient sequence processing—ideal for high-throughput sequencing data where alignment speed and memory usage are critical.