Push mtzqmmrlmzzx #34

Merged
coissac merged 25 commits from push-mtzqmmrlmzzx into main 2026-06-22 08:47:24 +00:00
Owner
No description provided.
coissac added 25 commits 2026-06-22 08:46:39 +00:00
Replaces `rayon` parallel iteration across index, rebuild, reindex, and select modules with a custom `PartitionRunner`. This introduces NUMA-aware task distribution with CPU pinning and round-robin scheduling, eliminating `Arc`, `Mutex`, and atomic synchronization primitives in favor of a flat, pre-spawned worker architecture. Error handling is simplified via `.map_err()` and the `?` operator, while progress bar updates are decoupled into dedicated callbacks.
Introduce `MemoryBitVec` and `MemoryIntVec` for efficient in-memory storage with hybrid compression and overflow handling. Implement `BitSlice`, `BitSliceMut`, `IntSlice`, and `IntSliceMut` traits across persistent and memory-backed types to enable generic slice operations and bitwise/arithmetic overloads. Add `col_persist` and `col_as_memory` methods to `BitMatrix` and `IntMatrix` for efficient column extraction. Align with the new single-pass rebuild architecture by supporting fast kmer filtering and matrix rebuilding. Includes comprehensive tests and profiling instrumentation for the packing phase.
Refactor parallel matrix construction by extracting reusable `pairwise_matrix` and `pairwise2_matrix` helpers, and consolidate binary record deserialization into dedicated parsing functions. Add `set` and `iter` methods to `BitSliceMut` and `MemoryBitVec` for ergonomic bit manipulation and iteration. Standardize JSON field extraction via `meta::field`, expose `MemoryBitIter`, and improve test reliability by automatically cleaning up temporary directories.
Introduce `byte_sum` and `byte_count_nonzero` to efficiently aggregate compact-int byte slices, bypassing per-element decoding and overflow map lookups. Refactor `sum()` and `count_nonzero()` across the matrix, reader, and traits modules to use direct memory-mapped slice iteration and idiomatic Rust iterators. Additionally, expose `MemoryIntIter` publicly and implement `IntoIterator` and `IntSlice` for `MemoryIntVec` to enable standard iteration and delegate aggregation to the new helpers.
Refactors bit manipulation and distance calculations to leverage standardized `BitSlice` traits, replacing manual byte/word logic with safer, reusable methods. Extends `IntSlice` and `IntSliceMut` traits to expose direct memory-mapped access and overflow management, enabling efficient bulk data extraction and serialization. Replaces manual bit-shifting loops with optimized table-based unpacking and adds population count and distance metric methods for improved performance. Updates `PersistentBitVecBuilder` with file tracking and safe flushing, and aligns test imports with new trait bounds.
Centralizes overflow handling and improves modularity by replacing manual mmap indexing and row loops with composable iterator patterns. This change leverages Rust's iterator traits for efficient, idiomatic column traversal while encapsulating data access in a dedicated view struct.
Refactor `BitIter` to process `u64` chunks using word-aligned shifts instead of byte-level operations. Introduce a dedicated `MemoryBitIter` for `MemoryBitVec`, updating its `iter()` and `IntoIterator` implementations accordingly. Hide `MemoryBitIter` from the public API to narrow the crate's interface, while leveraging explicit alignment guarantees for safer and more efficient bit extraction.
Implemented `sum()`, `count_nonzero()`, and `iter()` to complete the numeric vector interface. The builder now computes aggregate values across memory-mapped regions and overflow entries, while the reader delegates these operations to its inherent methods. The iterator provides zero-copy access to underlying `u32` elements.
Refactor `cmp_scalar`, `min`, `max`, `add`, and `diff` to operate directly on the primary byte array, deferring overflow slot resolution to a secondary pass. This eliminates HashMap lookups in the hot path and enables SIMD vectorization. Add six unit tests to validate correct promotion and demotion between storage slots when values cross the 255 threshold.
Document the two-tier compact integer encoding, BitSlice/IntSlice trait hierarchy, and SIMD-friendly O(n+k) algorithms. Include details on concrete memory and persistent vector types, matrix aggregation traits, and planned group-filtering APIs.
Add Mermaid diagrams to visualize the trait hierarchy, compact int storage layout, and SIMD-vectorizable arithmetic operations for MemoryIntVec and PersistentCompactIntVec. Also document concrete type structures and planned layer/partition composition rules to improve documentation clarity.
This commit introduces `BitColView` and `IntColView` to abstract over Columnar and Packed storage formats, implementing `BitSlice` and `IntSlice` for uniform column access. It adds `col_view()` accessors to `PersistentBitMatrix` and `PackedCompactIntMatrix`, explicitly panicking on implicit variants. The new types are publicly re-exported, and unit tests are added to validate per-element retrieval, aggregation methods, and parity with the original columnar representation.
Introduce the `ColGroup` struct and `MatrixGroupOps` trait to manage named subsets of column indices and perform additive aggregations (count, sum, any). Implement these operations for `PersistentBitMatrix` and `PersistentCompactIntMatrix`, applying size-optimized branches for presence counts and direct accumulation for small groups. Additionally, add a `mask_with` trait method that efficiently zero-sets elements based on a mask, optimized for sparse masks with O(n_zeros) complexity. Include comprehensive tests covering overflow handling, slot masking, and result additivity across partitioned data.
Introduces `TempCompactIntVec` and `TempBitVec` as temporary, file-backed intermediates to replace eager in-memory vectors, enabling OS-level paging under memory pressure. Updates the `MatrixGroupOps` trait to return `io::Result` types, allowing proper error propagation and supporting chunked accumulation for large column groups. Includes builder patterns with `.freeze()` finalization, automatic `TempDir` cleanup on drop, and necessary test updates to handle the new fallible signatures. Also fixes `Cargo.toml` section ordering.
Refactors column and matrix access to use unified `BitSliceView` and `IntSliceView` abstractions, replacing legacy `PackedCol`/`IntColView` types. Introduces `BitSlice`/`IntSlice` traits for zero-copy, trait-based bitwise and arithmetic operations across persistent and temporary vector types. Removes deprecated in-memory `MemoryBitVec` and `MemoryIntVec` implementations and their tests, while updating dependent crates to use the new view-based API and `BitSliceMut` trait.
Replace trait-based API documentation with concrete, zero-copy view structs and update all associated diagrams. Refine algorithmic descriptions for sentinel handling, overflow stores, and bulk operations. Clarify temporary file lifecycles and group-chunking strategies to support memory-efficient parallel aggregation.
Expands MatrixGroupOps with partial_group_min/max helpers for bitwise reductions and introduces add_col_from methods to persist external vectors as matrix columns. Refactors column aggregation in the partitioner to leverage these group operations directly, replacing iterative row processing with simplified builder lifecycle management and explicit metadata serialization.
Adds `FilterMask` and conditional bitwise methods (`*_where`) to `obicompactvec` for composable column-based slot filtering. Extends `obikpartitionner` with a `MatrixGroupOps` trait and `column_mask_expr` method to express aggregate constraints as vectorized masks. Refactors matrix builder management into a unified `Builders` enum and introduces `try_compute_combined_mask`, enabling O(1) slot checks and skipping unnecessary row reads during partitioning and rebuilding passes.
Refactor bitmatrix, colgroup, and layer modules to replace manual iteration with concise `or_where` predicates and bulk inversion calls. This simplifies the codebase and leverages optimized internal implementations for improved performance.
Introduces `new_ones` and `add_col_ones` methods to directly initialize all-ones bit vectors and matrix columns. This replaces redundant initialization sequences that created zero-filled structures and applied bitwise NOT, with a single pass that writes contiguous 0xFF bytes to disk. The change eliminates inversion overhead, streamlines test setup, and improves performance for filter mask intersection logic while preserving identical semantics.
Introduces a Make-based orchestration for simulating, indexing, merging, filtering, and verifying k-mer counts and presence. Exposes internal builder and iterator APIs publicly, enforces mandatory leading slashes for predicate patterns, registers the `obitaxonomy` crate, and updates tooling configurations alongside documentation.
Adds the `obitaxonomy` crate to parse and validate hierarchical taxonomy paths using a strict `taxonomy:/name@rank/...` syntax. Replaces generic string-based path matching in predicates with structured `TaxPath` and `TaxPattern` types, enforcing explicit anchor constraints and rank-aware semantics. Updates filtering documentation to clarify optional leading slashes and segment-boundary matching rules.
Integrates an obisys::Reporter across indexing and command modules to capture execution metrics. Replaces discarded timer stops with explicit rep.push() calls, adds timing instrumentation for the pack stage, and prints collected reports after each selection branch.
feat: add CI/CD workflows, release automation, and CLI version flag
CI / build (push) Failing after 2m41s
CI / build (pull_request) Failing after 6s
a522c0907e
Adds Gitea Actions for continuous integration and tagged releases, including static musl binary compilation and artifact upload. Introduces a Makefile target to automate semantic version bumping and publishing. Bumps the package version to 0.1.1 and enables automatic `--version` output via Clap.
refactor: update CI toolchain setup and optimize parallel indexing
CI / build (push) Successful in 4m56s
CI / build (pull_request) Successful in 4m11s
4e625afaba
Update CI workflows to explicitly install the Rust toolchain via rustup and configure musl targets for deterministic static builds in Docker containers. Bump obikmer dependency to 0.1.3. Refactor obicompactvec to reduce peak memory usage by computing column sizes from filesystem metadata, add atomic writes, and implement cleanup guards. Replace parallel iteration patterns in obikindex with a structured PartitionRunner pipeline for simplified error handling and progress tracking.
coissac merged commit c4c71dc892 into main 2026-06-22 08:47:24 +00:00
coissac deleted branch push-mtzqmmrlmzzx 2026-06-22 08:47:25 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: OBIKmers/obikmer#34