+ obiskio: add binary I/O with LRU pool and compression

- Add new obiskio crate for high-performance SuperKmer serialization/deserialization
- Implement binary codec with 2-bit packed sequence encoding and raw header format (32 bits)
- Add transparent compression support via niffler: Zstd, Gzip/Bgzf/Lz4
- Implement SKFilePool with LRU-based fd management, max-concurrent-fd limiting (75% of ulimit)
- Add SKFileWriter with batched writes, configurable flush threshold (8 KiB default), and two-phase locking
- Add SKFileReader with sequential access, LRU recovery via reopen_and_seek()
+ New obikpartitionner crate: basic header/seq handling for binary super-kmer format
- Bump niffler from 2.7 to v3, add dependencies: allocator-api2, bitflags(>=1), errno/fastrand/rustix/tempfile/lru/hashbrown/bzip2/thiserror
- Update workspace members to include obikpartitionner andobiskio
This commit is contained in:
Eric Coissac
2026-04-24 21:07:58 +02:00
parent d4e4289aff
commit c09d17401d
13 changed files with 1324 additions and 5 deletions
+14
View File
@@ -265,6 +265,20 @@ impl SuperKmer {
buf
}
/// Returns the raw 32-bit header word for binary serialisation.
/// Bits [7:0] = seql encoding (0→256, 1-255 direct). Bits [31:8] = payload.
#[inline]
pub fn header_bits(&self) -> u32 {
self.header.0
}
/// Returns a read-only view of the packed 2-bit sequence bytes.
/// Length is always `(seql() + 3) / 4` bytes.
#[inline]
pub fn seq_bytes(&self) -> &[u8] {
&self.seq
}
/// Extract the kmer of length k starting at nucleotide position i (0-based).
///
/// Returns an error if k is invalid (0 or > 32) or if position i + k exceeds the sequence length.