4736a7b6de
Replace in-memory hashing with a disk-backed external merge sort and `PersistentCompactIntVec` to drastically reduce peak RAM. Unify both phases using a custom `PtrHash` MPHF, eliminating `GOFunction` and `boomphf`. Introduce a concrete three-step `count_partition()` pipeline with adaptive chunk sizing based on available system memory. Update dependencies to `memmap2`, `ptr_hash`, and `obicompactvec`. Additionally, document strict genomics-only memory constraints and enforce an architectural feedback workflow requiring explicit user authorization before structural changes.
1.1 KiB
1.1 KiB
name, description, type
| name | description | type |
|---|---|---|
| No architectural decisions without explicit authorization | Never make architectural or design decisions without explicit user approval — code decisions are the user's alone | feedback |
Never make architectural decisions unilaterally. This includes:
- Memory layout or footprint changes
- Algorithm or data structure choices (HashSet vs streaming, etc.)
- Dependency additions or substitutions
- Structural refactors that go beyond the exact task requested
If a bug or inefficiency is observed, report it and propose alternatives — do not fix it without explicit authorization.
Why: The user optimizes for minimal memory footprint at all times. Introducing a HashSet in count_kmer() (replacing the intended streaming GOFunction construction from the sidecar estimate) caused a serious memory regression that went unreported. This is inadmissible on a project where memory efficiency is a core constraint.
How to apply: When editing code and noticing an architectural issue (even a clear improvement), stop, describe the problem and options, and wait for explicit go-ahead before touching anything.