Commit Graph

11 Commits

Author SHA1 Message Date
Eric Coissac ba49af6f9e refactor: parallelize merge and partition logic with obipipeline
Introduce the `obipipeline` dependency and refactor merge and partition logic to leverage parallel execution. Update `merge_partitions` to use rayon with dynamic memory budgeting and concurrency control via a pilot run. Refactor Pass 1 to concurrently read unitigs, filter kmers through a shared `LayeredMap`, and populate the graph safely. Simplify diagnostics to report total kmer counts and replace manual flags with graph length validation.
2026-06-12 21:32:04 +02:00
Eric Coissac 02cb30c0ef feat: add obisys crate for standardized CLI progress reporting
This commit introduces the `obisys` crate, which wraps `indicatif` to provide reusable `spinner` and `progress_bar` utilities with consistent styling and tick intervals. It refactors progress reporting across `obikindex`, `obikpartitionner`, and `obikmer` to use these shared functions, eliminating inline UI configuration and ensuring uniform terminal feedback.
2026-06-03 19:03:59 +02:00
Eric Coissac c5bcb7b8fa feat: introduce layered MPHF indexing and partition metadata
Refactors obikindex and obikpartitionner to delegate index construction to a new layered MPHF implementation. Adds resume-safe building with abundance filtering and count persistence, while introducing a PartitionMeta struct for JSON configuration persistence. Updates OKIError to wrap layer-specific errors, replaces single-path extraction with full path collection and logging, and registers new internal dependencies across the workspace.
2026-05-21 13:31:37 +02:00
Eric Coissac 9a1c0c0ee0 Add CLI progress bars and throughput metrics to partitioning
Add `indicatif` v0.17 to `obikmer` and `obikpartitionner` to instrument CLI workflows with real-time progress tracking. The changes integrate progress spinners and bars into the batch processing and parallel kmer counting loops, displaying processed base pairs, throughput rates, and elapsed time. Updates occur every 0.1s to enhance observability without modifying core partitioning logic.
2026-05-20 15:46:52 +02:00
Eric Coissac 4736a7b6de refactor: restructure k-mer partitioning pipeline for memory efficiency
Replace in-memory hashing with a disk-backed external merge sort and `PersistentCompactIntVec` to drastically reduce peak RAM. Unify both phases using a custom `PtrHash` MPHF, eliminating `GOFunction` and `boomphf`. Introduce a concrete three-step `count_partition()` pipeline with adaptive chunk sizing based on available system memory. Update dependencies to `memmap2`, `ptr_hash`, and `obicompactvec`. Additionally, document strict genomics-only memory constraints and enforce an architectural feedback workflow requiring explicit user authorization before structural changes.
2026-05-17 16:08:47 +08:00
Eric Coissac 5169f65dc9 feat: implement persistent layered index and chunked binary format
Introduce the `obilayeredmap` specification and persistent MPHF-based index architecture for incremental multi-dataset indexing. Implement chunked binary serialization with a fixed `u8` k-mer count limit (256) and overlapping super-kmer segments. Add memory-mapped I/O and a companion `.idx` index file for allocation-free, O(1) unitig access. Update MkDocs navigation, enhance the k-mer comparison script, and add comprehensive tests for serialization, partitioning, and file I/O pipelines.
2026-05-09 17:38:29 +08:00
Eric Coissac ebbfe35cbc Refactor: Extract utility function for string reversal
Extracted `inverser_chaine` into a reusable utility function with docstring and added unit test to ensure correctness.
2026-04-30 06:58:46 +02:00
Eric Coissac e7fa60a3a2 Refactor: Simplify user authentication flow
- Remove redundant validation logic in login handler
 - Consolidate session token generation into a single utility function  
- Update error handling to use consistent HTTP status codes
2026-04-28 08:38:26 +02:00
Eric Coissac 7efec54b27 .gitignore: ignore zstandard-compressed files
- Add *.zst pattern to .gitignore
- Prevents tracking of zstandard-compressed archives
2026-04-27 16:56:15 +02:00
Eric Coissac 1f466bf113 Refactor: simplify user authentication flow
- Replaced manual token validation with built-in middleware
 - Removed redundant session checks in controllers
2026-04-27 16:55:04 +02:00
Eric Coissac c09d17401d + obiskio: add binary I/O with LRU pool and compression
- Add new obiskio crate for high-performance SuperKmer serialization/deserialization
- Implement binary codec with 2-bit packed sequence encoding and raw header format (32 bits)
- Add transparent compression support via niffler: Zstd, Gzip/Bgzf/Lz4
- Implement SKFilePool with LRU-based fd management, max-concurrent-fd limiting (75% of ulimit)
- Add SKFileWriter with batched writes, configurable flush threshold (8 KiB default), and two-phase locking
- Add SKFileReader with sequential access, LRU recovery via reopen_and_seek()
+ New obikpartitionner crate: basic header/seq handling for binary super-kmer format
- Bump niffler from 2.7 to v3, add dependencies: allocator-api2, bitflags(>=1), errno/fastrand/rustix/tempfile/lru/hashbrown/bzip2/thiserror
- Update workspace members to include obikpartitionner andobiskio
2026-04-25 14:15:01 +02:00