Commit Graph

21 Commits

Author SHA1 Message Date
Eric Coissac 6d85387077 feat: add performance instrumentation and dynamic worker scaling
This change enhances observability and adaptability in the merge pipeline. Performance timing and debug logging are added to the De Bruijn graph and partition merge layers to track phase durations and pipeline metrics. The merge module replaces blocking receives with timed polls to sample CPU efficiency, dynamically spawning workers when utilization drops below a threshold. A new script is also introduced to parse merge debug logs and generate structured Markdown reports detailing throughput, phase breakdowns, and partition performance.
2026-06-13 13:21:53 +02:00
Eric Coissac fddf630772 style: apply consistent formatting and whitespace normalization
Applies consistent formatting, whitespace normalization, and indentation standardization to `debruijn.rs` and `merge.rs`. Reorganizes imports and downgrades a unitig traversal log from `info!` to `debug!`. No functional logic or runtime behavior is altered.
2026-06-13 11:58:20 +02:00
Eric Coissac fb8c6e427c refactor: pass Unitig objects directly instead of raw byte slices
Refactored `try_for_each_unitig` and related pipelines across `obidebruinj` and `obikpartitionner` to accept `Unitig` instances directly. This eliminates manual `Unitig::from_nucleotides()` conversions, simplifies the data flow, and reduces unnecessary allocation overhead.
2026-06-13 11:52:50 +02:00
Eric Coissac 1f336fe496 refactor: replace mutex with channels for parallel debruijn processing
Add `rayon` and `crossbeam-channel` dependencies to support concurrent execution. Replace the synchronous, mutex-protected closure pattern with a channel-based producer-consumer approach using `std::thread::scope`. This decouples unitig iteration from processing, eliminating lock contention and `Mutex` overhead while enabling parallel workloads.
2026-06-13 11:49:27 +02:00
Eric Coissac 5f98d2ef96 refactor: replace explicit collect with Unitig::from_nucleotides
Introduce a thread-local buffer to materialize nucleotide iterators into contiguous slices. Update `try_for_each_unitig` across the debruijn, index, merge, and rebuild layers to directly instantiate `Unitig` via `from_nucleotides()` instead of explicitly collecting iterators. This eliminates intermediate allocations and aligns test code with the new approach.
2026-06-13 11:47:06 +02:00
Eric Coissac 8b563d0804 refactor: migrate pipeline stages and improve graph processing
Refactored neighbor resolution to explicitly track unvisited indices for degree-1 nodes, updated display formatting, and added timing and debug logging to the degree computation routine. Migrated pipeline stages from eager vector returns to explicit flat implementations, enabling backpressure-aware streaming, configurable batch processing, incremental yielding, and progress tracking via a delta channel.
2026-06-13 11:44:17 +02:00
Eric Coissac 09d9e21744 feat: integrate tracing and enhance bit matrix operations
Add the `tracing` crate to `obidebruinj`, `obisys`, and resolve it in `Cargo.lock`. Replace `eprintln!` statements with structured `debug!` and `info!` macros. Introduce a `TracedBar` wrapper for progress bars and enhance the `Stage` lifecycle to emit structured events for timing, memory metrics, and swap warnings. Add a progress spinner for unitig degree computation. Extend `PersistentBitMatrix` with columnar bit-vector operations and parallel distance methods, enabling uniform distance computations across all storage layouts while replacing previous panics with dimension-based fallbacks.
2026-06-08 19:55:06 +02:00
Eric Coissac 03c7bb0b99 Relax unitig assertion in debruijn test
Replace the strict `unitigs.len() == 1` assertion with a non-empty check to allow multiple unitigs. Update the test comment to describe the general non-repetitive sequence recovery principle instead of a specific example. The core k-mer roundtrip validation logic remains unchanged.
2026-06-06 06:41:45 +02:00
Eric Coissac b39eee688a refactor(debruijn): unify graph traversal with WalkState iterator
Replaces deeply nested branching with early returns and `then_some`. Introduces a cycle-detecting `find_chain_start` method and updates `UnitigNucIter` to use step-based iteration with atomic node claiming. This eliminates nested iterators and redundant state management, improving code readability and maintainability.
2026-06-06 06:38:28 +02:00
Eric Coissac 95b3461405 refactor: centralize graph traversal logic in walk
Refactor `leavable` and `reachable` to eliminate duplicated graph traversal logic by mutually delegating via `WalkState`. `leavable` now returns `self.walk(graph).is_some()`, while `reachable` delegates to the inverted `direct` state's `leavable` check. This centralizes kmer extension and visited-state validation in `walk`, simplifying control flow and reducing code duplication.
2026-06-06 06:36:48 +02:00
Eric Coissac 952a21eef8 refactor: remove naked_asm and extract collect_unitigs helper
Remove the `std::arch::naked_asm` import as it is no longer required. Introduce a `collect_unitigs` helper to abstract nucleotide sequence extraction from `GraphDeBruijn`, and refactor the test suite to use it, eliminating repetitive collection code and standardizing graph iteration logic.
2026-06-06 04:33:59 +02:00
Eric Coissac 5c2f48535f refactor: rename compute_degrees and mark start nodes
Renames `compute_degrees` to `compute_degrees_and_mark_starts` across the De Bruijn graph and partitioner layers to consolidate degree calculation and start-node flagging. Introduces safe neighbor iteration methods and a debug validation block to verify graph consistency. Refactors unitig extraction to use sequential execution with a `Mutex` for safe error propagation. Fixes malformed and duplicated method calls, adds auto-generation of missing `meta.json` files, and ensures persistent matrix builders are explicitly closed to finalize metadata.
2026-06-05 19:48:59 +02:00
Eric Coissac 27088ab810 refactor: optimize unitig iteration and graph traversal
Switches unitig processing to a lazy, fallible `try_for_each_unitig` API across partitioner layers, reducing intermediate allocations and enabling proper error propagation. Refactors de Bruijn graph traversal into a two-pass algorithm with explicit node flags, named constants, and diagnostic logging. Introduces parallel chain processing and staged performance profiling for the unitig command, and adds a memory-efficient `FromIterator` implementation for packed nucleotide sequences.
2026-06-05 19:48:59 +02:00
Eric Coissac d202ead385 feat: parallelize unitig extraction and FASTA output
Replace the non-atomic `set_visited` with atomic `fetch_or` bitmask operations to enable thread-safe node claiming. Introduce a two-phase extraction pipeline where `par_for_each_chain_unitig` builds chains in parallel and `for_each_remaining_unitig` sequentially handles residual cycles and junctions. Add `is_start` and `collect_from_start` to explicitly define unitig boundaries. Wrap `BufWriter` in a `Mutex` and use an `AtomicUsize` counter to ensure thread-safe concurrent FASTA output, refactoring the write logic into a shared closure for safe multi-threaded execution.
2026-06-05 10:33:52 +02:00
Eric Coissac 2f29ee2240 feat: Add parallel execution and thread-safe graph operations
Integrate rayon to enable parallel processing of k-mer partitions and degree computation. Replace Cell with AtomicU8 to ensure thread-safe node state management, and add a merge method for combining disjoint graphs. Additionally, introduce progress tracking utilities and a test-utils feature flag for development dependencies.
2026-06-04 23:22:55 +02:00
Eric Coissac 17c9e076bd refactor: extract obikindex crate and remove deprecated CLI commands
Extracted core indexing logic, state tracking, and metadata management into a new `obikindex` crate. Refactored the `index` and `unitig` commands to leverage the `KmerIndex` abstraction and state-driven pipeline transitions. Removed obsolete CLI subcommands (`count`, `fasta`, `longtig`, `partition`) and their associated pipeline steps. Updated FASTA writing utilities for single-line output and deterministic identifiers, and refreshed workspace dependencies.
2026-05-20 18:54:12 +02:00
Eric Coissac 8c17bf958b refactor: centralize k-mer config and introduce packed sequences
Centralize k-mer and minimizer configuration using a thread-safe global module, and replace manual bit-packing with a memory-efficient `PackedSeq` type. Refactor core sequence and k-mer types to use compile-time length enforcement and centralized hashing. Introduce a new De Bruijn graph implementation with compact node encoding and traversal iterators. Update I/O, partitioning, and builder modules to align with the new architecture, and add the `xxhash-rust` dependency.
2026-05-08 06:34:24 +08:00
Eric Coissac 86e9cb7026 refactor: improve de Bruijn graph traversal and longtig generation
- Refactored Node representation using compact bitfields for neighbor counts
  and nucleotides; added count_neighbors helper to compute_degrees()
- Introduced StartIter iterator for unitig/longtigu generation with revised
  traversal semantics (e.g., interior node marking)
- Added nucleotide() accessor to Kmer type for 2-bit extraction at position i
- Renamed unitig.rs → longtigs, updated CLI command and output filenames to reflect "long t ig"
- Extended extract_kmers() in scripts/compare.py with duplication statistics
```
2026-05-02 16:31:08 +02:00
Eric Coissac 35840b9d73 refactor: simplify unitig iteration logic and API
Refactors the `start_iter` function to separate chain/cycle detection into two passes, consolidates visited marking logic internally in `next_unitig_kmer`, and streamlines the public API by removing redundant methods, updating iteration parameters to `(start_first_next)`, and eliminating external state like `at_start`.
2026-05-02 16:31:08 +02:00
Eric Coissac defeeb9460 feat: enforce canonical k-mer representation throughout the codebase
Refactor core types to consistently use `CanonicalKMer` (lexicographically minimal of k-mer and its reverse complement) as the canonical representation, ensuring deterministic behavior in graph traversal (unitig decomposition), neighbor resolution (`unique_neighbor` with `[CanonicalKmer; 4]` input) and scatter output generation. Introduce `RoutableSuperKmer`, add `.seq_hash()` support, fix type syntax errors in unitig extraction methods and deduplication tests. Update all k-mer construction to use canonical-aware APIs, including unsafe unchecked constructors for performance-critical paths.
2026-05-02 16:31:08 +02:00
Eric Coissac 4e26e3bd40 Refactor: Simplify user authentication flow
- Remove redundant password validation logic
 - Integrate JWT-based session management for improved security and scalability
2026-04-30 07:04:03 +02:00