refactor: restructure k-mer partitioning pipeline for memory efficiency
Replace in-memory hashing with a disk-backed external merge sort and `PersistentCompactIntVec` to drastically reduce peak RAM. Unify both phases using a custom `PtrHash` MPHF, eliminating `GOFunction` and `boomphf`. Introduce a concrete three-step `count_partition()` pipeline with adaptive chunk sizing based on available system memory. Update dependencies to `memmap2`, `ptr_hash`, and `obicompactvec`. Additionally, document strict genomics-only memory constraints and enforce an architectural feedback workflow requiring explicit user authorization before structural changes.
This commit is contained in:
@@ -0,0 +1,17 @@
|
||||
---
|
||||
name: No architectural decisions without explicit authorization
|
||||
description: Never make architectural or design decisions without explicit user approval — code decisions are the user's alone
|
||||
type: feedback
|
||||
---
|
||||
|
||||
Never make architectural decisions unilaterally. This includes:
|
||||
- Memory layout or footprint changes
|
||||
- Algorithm or data structure choices (HashSet vs streaming, etc.)
|
||||
- Dependency additions or substitutions
|
||||
- Structural refactors that go beyond the exact task requested
|
||||
|
||||
If a bug or inefficiency is observed, **report it and propose alternatives** — do not fix it without explicit authorization.
|
||||
|
||||
**Why:** The user optimizes for minimal memory footprint at all times. Introducing a HashSet in `count_kmer()` (replacing the intended streaming GOFunction construction from the sidecar estimate) caused a serious memory regression that went unreported. This is inadmissible on a project where memory efficiency is a core constraint.
|
||||
|
||||
**How to apply:** When editing code and noticing an architectural issue (even a clear improvement), stop, describe the problem and options, and wait for explicit go-ahead before touching anything.
|
||||
Reference in New Issue
Block a user