Commit Graph

12 Commits

Author SHA1 Message Date
Eric Coissac 27f5e88a7b refactor: implement RoutableSuperKmer and update k-mer indexing pipeline
Replace raw SuperkMer routing with a new RoutableSuperKimer type that embeds canonical sequences and precomputed minimizers, enabling direct partition routing via hash. Update the build pipeline to yield RoutableSuperKmers throughout (builder, scatterer), refactor FASTA/unitig export commands to use the new type and compressed outputs (.fasta.gz, .unitigs.fasta.zst), revise SuperKmer header to store n_kmers instead of seql (avoiding 256-byte wrap), and update documentation to reflect minimizer-based theory, two evidence-encoding strategies for unitig-MPHF indexing (global offset vs. ID+rank), and the new obipipeline library architecture with parallel workers, biased scheduling, and error handling.
2026-05-01 09:33:26 +02:00
Eric Coissac 4e26e3bd40 Refactor: Simplify user authentication flow
- Remove redundant password validation logic
 - Integrate JWT-based session management for improved security and scalability
2026-04-30 07:04:03 +02:00
Eric Coissac 97e65bd831 ♻️ refactor pipeline architecture and fix macOS memory detection
- Replace WorkerPool-based pipelines with typed `Pipe` abstraction in obipipeline
  - Introduce Pipe/PipeIter for composable, sourceless/sink-less pipelines
- Update partition and superkmer commands to use new Pipe API via make_pipe!
  - Remove Arc<Mutex<...>> patterns; simplify state management
- Fix macOS available_memory() returning 0 by falling back to half total memory in dereplicate()
- Remove unused `format: "zstd"` field from partition.meta
2026-04-30 07:04:03 +02:00
Eric Coissac 4c19882f03 add PhantomData import for generic type safety
- Added `use std::marker::PhantomData;` to prepare for generic scheduler implementations
- Ensures type safety and avoids unused lifetime/type parameters warnings
2026-04-30 07:04:03 +02:00
Eric Coissac ebbfe35cbc Refactor: Extract utility function for string reversal
Extracted `inverser_chaine` into a reusable utility function with docstring and added unit test to ensure correctness.
2026-04-30 06:58:46 +02:00
Eric Coissac e7fa60a3a2 Refactor: Simplify user authentication flow
- Remove redundant validation logic in login handler
 - Consolidate session token generation into a single utility function  
- Update error handling to use consistent HTTP status codes
2026-04-28 08:38:26 +02:00
Eric Coissac eaf893174f ♻️ refactor(obikpartitionner): replace low-level I/O with obiskio::SKFileWriter
- Replace `limits` module and raw binary I/O with a new high-level abstraction using obiskio::SKFileWriter
- Remove `niffler` dependency and compression logic (Gzip/Zstd/Lz4/Bgzf)
- Simplify PartitionManager to manage partitioned file writers based on kmer hashing
  * Uses `n_partition_bits` for bitmask-based partition selection (2^n partitions)
- Add obiskio as a local dependency
Note: This is likely part of aligning with unified I/O primitives in the obiskio crate.
2026-04-26 15:00:12 +02:00
Eric Coissac c09d17401d + obiskio: add binary I/O with LRU pool and compression
- Add new obiskio crate for high-performance SuperKmer serialization/deserialization
- Implement binary codec with 2-bit packed sequence encoding and raw header format (32 bits)
- Add transparent compression support via niffler: Zstd, Gzip/Bgzf/Lz4
- Implement SKFilePool with LRU-based fd management, max-concurrent-fd limiting (75% of ulimit)
- Add SKFileWriter with batched writes, configurable flush threshold (8 KiB default), and two-phase locking
- Add SKFileReader with sequential access, LRU recovery via reopen_and_seek()
+ New obikpartitionner crate: basic header/seq handling for binary super-kmer format
- Bump niffler from 2.7 to v3, add dependencies: allocator-api2, bitflags(>=1), errno/fastrand/rustix/tempfile/lru/hashbrown/bzip2/thiserror
- Update workspace members to include obikpartitionner andobiskio
2026-04-25 14:15:01 +02:00
Eric Coissac 3f8880a7e5 📦 Add infer and new pipeline infrastructure
- Update Cargo.lock with dependency additions (bumpalo, byteorder, cfb, fnv, infer, js-sys, uuid wasm-bindgen)
- Refactor obikseq::superkmer: reorder imports and improve formatting
  - Add `obipipeline` crate with scheduler, error handling & macros (WIP)
- Replace obiread::expand_paths logic with PathIter and path_iterator module
  - Add mimetype detection using `infer` crate via PeekReader wrapper
2026-04-23 21:06:11 +02:00
Eric Coissac 380b5a6f94 📖 Update super-kmer theory and implementation to prefer non-degenerate m-mers
- Update super-kmer definition in `kmERS.md` to specify that non-degenerate m-mers are preferred over degenerate ones (degeneracy = homopolymer).
- Refactor `superkmer.rs`: change `.canonical()` to mutate in-place and return bool.
- Add `m` field & canonical-aware minimizer position calculation to SuperKmerIter in obiskbuilder.
- Add helper functions `is_degenerate` and minimizer comparison logic to rolling_stat.rs for consistent tie-breaking.
- Minor formatting cleanup in superkmer command and chunk processing.
2026-04-20 17:50:09 +02:00
Eric Coissac ae5e1152b9 (feat) Add entropy-based filtering and rolling statistics for k-mers
- Introduce lazy_static dependency
- Refactor encoding: rename encode_base →encode_nuc and make it pub(crate)
- Add from_raw_right/raw Right methods to Kmer for right-aligned handling
- Improve error message formatting and code readability in kmod.rs tests  
- Replace inline entropy computation with precomputed tables (entropy_table module)—using LazyLock for static lookup arrays
- Simplify EntropyFilter by removing redundant tables and delegating to new entropy_table API  
- Add RollingStat module for real-time kmer statistics and minimizer tracking
- Reorganize modules: move iter, encoding to pub(crate), add entropy_table and rolling_stat
- Update imports across obiskbuilder crate accordingly
2026-04-20 15:36:02 +02:00
Eric Coissac de3f9b16cf first implementation but far to be optimal 2026-04-19 12:17:16 +02:00