refactor: restructure k-mer partitioning pipeline for memory efficiency

Replace in-memory hashing with a disk-backed external merge sort and `PersistentCompactIntVec` to drastically reduce peak RAM. Unify both phases using a custom `PtrHash` MPHF, eliminating `GOFunction` and `boomphf`. Introduce a concrete three-step `count_partition()` pipeline with adaptive chunk sizing based on available system memory. Update dependencies to `memmap2`, `ptr_hash`, and `obicompactvec`. Additionally, document strict genomics-only memory constraints and enforce an architectural feedback workflow requiring explicit user authorization before structural changes.
This commit is contained in:
Eric Coissac
2026-05-17 15:34:44 +08:00
parent f36b095ce2
commit 4736a7b6de
10 changed files with 230 additions and 114 deletions
+4 -1
View File
@@ -20,5 +20,8 @@ sysinfo = "0.33"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tracing = "0.1.44"
ph = "0.11"
cacheline-ef = "1.1"
epserde = "0.8"
memmap2 = "0.9.10"
obicompactvec = { path = "../obicompactvec" }
ptr_hash = "1.1"