obikmer

Author	SHA1	Message	Date
Eric Coissac	4736a7b6de	refactor: restructure k-mer partitioning pipeline for memory efficiency Replace in-memory hashing with a disk-backed external merge sort and `PersistentCompactIntVec` to drastically reduce peak RAM. Unify both phases using a custom `PtrHash` MPHF, eliminating `GOFunction` and `boomphf`. Introduce a concrete three-step `count_partition()` pipeline with adaptive chunk sizing based on available system memory. Update dependencies to `memmap2`, `ptr_hash`, and `obicompactvec`. Additionally, document strict genomics-only memory constraints and enforce an architectural feedback workflow requiring explicit user authorization before structural changes.	2026-05-17 16:08:47 +08:00
Eric Coissac	f36b095ce2	docs: clarify MPHF indexing, storage layout, and distance traits Formalize the two-phase MPHF indexing architecture and update Phase 6 to use `evidence.bin` for direct kmer extraction. Simplify the evidence and unitig storage layouts to flat packed formats enabling O(1) random access. Introduce aggregation traits (`ColumnWeights`, `CountPartials`, `BitPartials`) to support additive distance metric decomposition across partitions. Narrow the documented scope from metagenomic to individual genome datasets, and replace speculative open questions with concrete implementation specifications.	2026-05-17 15:59:10 +08:00
Eric Coissac	5169f65dc9	feat: implement persistent layered index and chunked binary format Introduce the `obilayeredmap` specification and persistent MPHF-based index architecture for incremental multi-dataset indexing. Implement chunked binary serialization with a fixed `u8` k-mer count limit (256) and overlapping super-kmer segments. Add memory-mapped I/O and a companion `.idx` index file for allocation-free, O(1) unitig access. Update MkDocs navigation, enhance the k-mer comparison script, and add comprehensive tests for serialization, partitioning, and file I/O pipelines.	2026-05-09 17:38:29 +08:00
Eric Coissac	ebbfe35cbc	Refactor: Extract utility function for string reversal Extracted `inverser_chaine` into a reusable utility function with docstring and added unit test to ensure correctness.	2026-04-30 06:58:46 +02:00
Eric Coissac	de3f9b16cf	first implementation but far to be optimal	2026-04-19 12:17:16 +02:00

5 Commits