Commit Graph

259 Commits

Author SHA1 Message Date
coissac 973a3f3d6e Merge pull request 'feat: add numa feature flag and automate release workflow' (#44) from push-uymxyvsyooro into main
CI / build (push) Successful in 3m16s
Reviewed-on: #44
2026-06-23 07:22:33 +00:00
Eric Coissac 1a839a295a feat: add numa feature flag and automate release workflow
Release / create-release (push) Failing after 37s
Release / build-linux-x86_64 (push) Has been skipped
Release / build-macos-arm64 (push) Has been skipped
CI / build (pull_request) Successful in 3m23s
Refactor the Gitea release pipeline to generate releases via API and upload binaries using a shared ID. Automate changelog generation by fetching recent commits with `jj log` and producing markdown notes via `aichat`. Convert `hwlocality` to an optional dependency gated by a default `numa` feature, providing fallback implementations for graceful degradation when NUMA support is disabled. Bump obikmer to 1.1.18.
v1.1.18
2026-06-23 09:07:04 +02:00
coissac 2ea58703c7 Merge pull request 'Push zkptpswyxnvt' (#43) from push-zkptpswyxnvt into main
CI / build (push) Successful in 3m19s
Reviewed-on: #43
2026-06-22 16:29:59 +00:00
Eric Coissac ac3ef106e7 refactor: implement adaptive worker scaling and infallible NUMA build
Release / build-linux-static (push) Successful in 8m4s
CI / build (pull_request) Successful in 3m26s
Replaces the fallible NUMA topology builder with an infallible fallback that synthesizes a single-node UMA configuration on failure or absence. Refactors PartitionRunner to pre-spawn dormant workers and dynamically activate them via CPU efficiency thresholds, replacing static upfront spawning with adaptive scaling. Bumps obikmer crate version to 1.1.15.
v1.1.15
2026-06-22 18:29:39 +02:00
Eric Coissac 469e53b6f5 Add genomic distance benchmarking suite and test data
Introduces scripts to compute and validate pairwise genomic distance matrices across multiple metrics. Updates the Makefile with build and comparison targets, adds .gitignore rules for generated outputs, and includes test CSV matrices and a Newick phylogenetic tree for validating the distance computation pipeline.
2026-06-22 18:24:30 +02:00
Eric Coissac 9f1df96ea7 ci: restrict push trigger to main branch
Replace the wildcard `['**']` in the push trigger with `['main']`. This prevents redundant pipeline runs on non-main branches during push events.
2026-06-22 16:58:44 +02:00
coissac 4e4cce2879 Merge pull request 'fix(ci): enable cross-compilation in release workflow and bump obikmer' (#42) from push-sxlpkrkyuttk into main
CI / build (push) Successful in 3m20s
Reviewed-on: #42
2026-06-22 14:31:26 +00:00
Eric Coissac 68b05b93c4 fix(ci): enable cross-compilation in release workflow and bump obikmer
CI / build (push) Successful in 4m38s
Release / build-linux-static (push) Successful in 10m14s
CI / build (pull_request) Successful in 3m40s
Injects PKG_CONFIG_ALLOW_CROSS=1 into the static binary build step to ensure correct native dependency resolution during musl target compilation with cargo zigbuild. Also updates the obikmer crate version from 1.1.13 to 1.1.14.
v1.1.14
2026-06-22 16:31:04 +02:00
coissac 0a668cf8a6 Merge pull request 'chore: bump obikmer to 1.1.13 and fix Makefile revision tag' (#41) from push-qwzpxktnlyls into main
CI / build (push) Has been cancelled
Reviewed-on: #41
2026-06-22 14:18:25 +00:00
Eric Coissac e6d6942e2f chore: bump obikmer to 1.1.13 and fix Makefile revision tag
CI / build (push) Has been cancelled
Release / build-linux-static (push) Has been cancelled
CI / build (pull_request) Has been cancelled
Update the obikmer crate version from 1.1.12 to 1.1.13 in Cargo.toml. Additionally, change the Makefile's Git revision specifier from @- to @ to ensure the version tag is applied to the current commit before pushing.
v1.1.13
2026-06-22 16:17:52 +02:00
coissac bf9c9aeacb Merge pull request 'chore: bump version to 1.1.12 and fix release workflow' (#40) from push-zmkxouxypspm into main
CI / build (push) Has been cancelled
Reviewed-on: #40
2026-06-22 14:13:13 +00:00
Eric Coissac 22a65857a1 chore: bump version to 1.1.12 and fix release workflow
CI / build (push) Successful in 3m14s
CI / build (pull_request) Successful in 3m51s
Update Cargo.toml to 1.1.12 for a semver patch release. Refactor the Makefile release target to explicitly retrieve the parent commit hash via `jj log` and apply the tag, replacing implicit working directory tagging. Remove `jj auto-describe` and `--change @` in favor of an explicit `git push origin` for the version tag.
2026-06-22 16:12:56 +02:00
coissac d16a867640 Merge pull request 'ci: bypass PEP 668 restrictions and update obikmer to 1.1.11' (#39) from push-mxysluysloxr into main
CI / build (push) Successful in 4m1s
Release / build-linux-static (push) Failing after 7m28s
Reviewed-on: #39
v1.1.12
2026-06-22 13:52:51 +00:00
Eric Coissac 616050075f ci: bypass PEP 668 restrictions and update obikmer to 1.1.11
CI / build (push) Successful in 3m41s
CI / build (pull_request) Successful in 3m49s
Add the `--break-system-packages` flag to the `pip install ziglang` command in the Gitea release workflow to bypass PEP 668 restrictions on modern Linux distributions. Additionally, bump the `obikmer` crate version from 1.1.9 to 1.1.11 across both Cargo.toml and Cargo.lock.
2026-06-22 15:52:34 +02:00
coissac e22afe9621 Merge pull request 'chore: bump obikmer to 1.1.9 and update release workflow' (#38) from push-noxuppsknsol into main
CI / build (push) Successful in 3m11s
Release / build-linux-static (push) Failing after 3m4s
Reviewed-on: #38
v1.1.11
2026-06-22 13:32:50 +00:00
Eric Coissac bdfac71e65 chore: bump obikmer to 1.1.9 and update release workflow
CI / build (push) Successful in 3m24s
CI / build (pull_request) Failing after 0s
Bumps the obikmer crate version from 1.1.7 to 1.1.9 in Cargo.toml and Cargo.lock. Updates the Gitea release workflow to dynamically locate the Zig compiler via Python, generating musl-targeted gcc/g++ wrapper scripts installed to /usr/local/bin for static Linux cross-compilation during releases.
2026-06-22 15:32:10 +02:00
coissac a00bb37478 Merge pull request 'ci: switch to Zig build toolchain and bump obikmer to 1.1.7' (#37) from push-nvvqmzmrotxx into main
CI / build (push) Successful in 3m14s
Release / build-linux-static (push) Failing after 2m42s
Reviewed-on: #37
v1.1.9
2026-06-22 13:20:12 +00:00
Eric Coissac d30a4efd9b ci: switch to Zig build toolchain and bump obikmer to 1.1.7
CI / build (push) Successful in 3m12s
CI / build (pull_request) Successful in 3m16s
Replaces the musl-based static Linux build toolchain with Zig (`ziglang` via pip and `cargo-zigbuild`), removing `musl-tools` dependencies. The workflow now invokes `cargo zigbuild` for cross-compiling the static binary. Additionally, bumps the `obikmer` crate version to 1.1.7.
2026-06-22 15:19:39 +02:00
coissac 6baf2e64ca Merge pull request 'chore: bump obikmer to 1.1.6 and automate git tagging' (#36) from push-yxmtknzsynpx into main
CI / build (push) Successful in 3m20s
Release / build-linux-static (push) Failing after 4m30s
Reviewed-on: #36
v1.1.7
2026-06-22 13:01:06 +00:00
Eric Coissac c0a71a2d49 chore: bump obikmer to 1.1.6 and automate git tagging
CI / build (push) Successful in 4m46s
CI / build (pull_request) Successful in 4m27s
Bumps the obikmer crate version from 0.1.4 to 1.1.6 in Cargo.toml and Cargo.lock. Updates the Makefile release target to automatically extract the version, create a Git tag, and push it to the remote repository, extending the existing workflow with standard Git publishing steps.
2026-06-22 15:00:36 +02:00
coissac a609c1af95 Merge pull request 'ci: streamline release workflow and bump obikmer to 0.1.4' (#35) from push-zokprynyqunu into main
CI / build (push) Successful in 4m41s
Release / build-linux-static (push) Failing after 6m4s
Reviewed-on: #35
v1.1.6
2026-06-22 09:38:36 +00:00
Eric Coissac 3d32be8a83 ci: streamline release workflow and bump obikmer to 0.1.4
CI / build (push) Successful in 4m33s
CI / build (pull_request) Successful in 4m17s
Replaces the intermediate artifact upload step in the Gitea release workflow with a direct REST API call, eliminating unnecessary dependencies and adding `jq`. Also increments the obikmer crate version to 0.1.4.
2026-06-22 11:38:19 +02:00
coissac c4c71dc892 Merge pull request 'Push mtzqmmrlmzzx' (#34) from push-mtzqmmrlmzzx into main
CI / build (push) Successful in 4m34s
Reviewed-on: #34
2026-06-22 08:47:24 +00:00
Eric Coissac 4e625afaba refactor: update CI toolchain setup and optimize parallel indexing
CI / build (push) Successful in 4m56s
CI / build (pull_request) Successful in 4m11s
Update CI workflows to explicitly install the Rust toolchain via rustup and configure musl targets for deterministic static builds in Docker containers. Bump obikmer dependency to 0.1.3. Refactor obicompactvec to reduce peak memory usage by computing column sizes from filesystem metadata, add atomic writes, and implement cleanup guards. Replace parallel iteration patterns in obikindex with a structured PartitionRunner pipeline for simplified error handling and progress tracking.
2026-06-22 10:46:24 +02:00
Eric Coissac a522c0907e feat: add CI/CD workflows, release automation, and CLI version flag
CI / build (push) Failing after 2m41s
CI / build (pull_request) Failing after 6s
Adds Gitea Actions for continuous integration and tagged releases, including static musl binary compilation and artifact upload. Introduces a Makefile target to automate semantic version bumping and publishing. Bumps the package version to 0.1.1 and enables automatic `--version` output via Clap.
2026-06-22 10:36:40 +02:00
Eric Coissac c1d6f277ce feat(select): add metrics reporting to selection methods
Integrates an obisys::Reporter across indexing and command modules to capture execution metrics. Replaces discarded timer stops with explicit rep.push() calls, adds timing instrumentation for the pack stage, and prints collected reports after each selection branch.
2026-06-22 10:25:24 +02:00
Eric Coissac 9356be4ec0 feat: introduce obitaxonomy crate for hierarchical taxonomy parsing
Adds the `obitaxonomy` crate to parse and validate hierarchical taxonomy paths using a strict `taxonomy:/name@rank/...` syntax. Replaces generic string-based path matching in predicates with structured `TaxPath` and `TaxPattern` types, enforcing explicit anchor constraints and rank-aware semantics. Updates filtering documentation to clarify optional leading slashes and segment-boundary matching rules.
2026-06-22 10:24:04 +02:00
Eric Coissac c694e1f2b0 feat: add benchmark pipeline, expose APIs, and enforce strict paths
Introduces a Make-based orchestration for simulating, indexing, merging, filtering, and verifying k-mer counts and presence. Exposes internal builder and iterator APIs publicly, enforces mandatory leading slashes for predicate patterns, registers the `obitaxonomy` crate, and updates tooling configurations alongside documentation.
2026-06-22 10:18:33 +02:00
Eric Coissac 280ca1f5a3 feat: add optimized new_ones constructor for all-ones bit vectors
Introduces `new_ones` and `add_col_ones` methods to directly initialize all-ones bit vectors and matrix columns. This replaces redundant initialization sequences that created zero-filled structures and applied bitwise NOT, with a single pass that writes contiguous 0xFF bytes to disk. The change eliminates inversion overhead, streamlines test setup, and improves performance for filter mask intersection logic while preserving identical semantics.
2026-06-22 10:00:01 +02:00
Eric Coissac 9abb2db92f refactor: replace explicit bit-setting loops with optimized bulk operations
Refactor bitmatrix, colgroup, and layer modules to replace manual iteration with concise `or_where` predicates and bulk inversion calls. This simplifies the codebase and leverages optimized internal implementations for improved performance.
2026-06-22 09:56:41 +02:00
Eric Coissac 7c1efa9cbb feat: add vectorized column filters and optimize partitioner iteration
Adds `FilterMask` and conditional bitwise methods (`*_where`) to `obicompactvec` for composable column-based slot filtering. Extends `obikpartitionner` with a `MatrixGroupOps` trait and `column_mask_expr` method to express aggregate constraints as vectorized masks. Refactors matrix builder management into a unified `Builders` enum and introduces `try_compute_combined_mask`, enabling O(1) slot checks and skipping unnecessary row reads during partitioning and rebuilding passes.
2026-06-22 09:54:51 +02:00
Eric Coissac 4c4524766c feat(matrix): add partial group reductions and column persistence
Expands MatrixGroupOps with partial_group_min/max helpers for bitwise reductions and introduces add_col_from methods to persist external vectors as matrix columns. Refactors column aggregation in the partitioner to leverage these group operations directly, replacing iterative row processing with simplified builder lifecycle management and explicit metadata serialization.
2026-06-22 09:49:04 +02:00
Eric Coissac 7eea71fdcd docs(obicompactvec): update API docs and algorithm descriptions
Replace trait-based API documentation with concrete, zero-copy view structs and update all associated diagrams. Refine algorithmic descriptions for sentinel handling, overflow stores, and bulk operations. Clarify temporary file lifecycles and group-chunking strategies to support memory-efficient parallel aggregation.
2026-06-22 09:46:19 +02:00
Eric Coissac f91c5a3f79 refactor(obicompactvec): unify bit and int vector slice views
Refactors column and matrix access to use unified `BitSliceView` and `IntSliceView` abstractions, replacing legacy `PackedCol`/`IntColView` types. Introduces `BitSlice`/`IntSlice` traits for zero-copy, trait-based bitwise and arithmetic operations across persistent and temporary vector types. Removes deprecated in-memory `MemoryBitVec` and `MemoryIntVec` implementations and their tests, while updating dependent crates to use the new view-based API and `BitSliceMut` trait.
2026-06-17 23:51:32 +02:00
Eric Coissac fb4962c4fe refactor: replace in-memory vectors with temp-file-backed storage
Introduces `TempCompactIntVec` and `TempBitVec` as temporary, file-backed intermediates to replace eager in-memory vectors, enabling OS-level paging under memory pressure. Updates the `MatrixGroupOps` trait to return `io::Result` types, allowing proper error propagation and supporting chunked accumulation for large column groups. Includes builder patterns with `.freeze()` finalization, automatic `TempDir` cleanup on drop, and necessary test updates to handle the new fallible signatures. Also fixes `Cargo.toml` section ordering.
2026-06-17 23:36:15 +02:00
Eric Coissac 1d38d87ff9 Add column group operations and mask_with trait
Introduce the `ColGroup` struct and `MatrixGroupOps` trait to manage named subsets of column indices and perform additive aggregations (count, sum, any). Implement these operations for `PersistentBitMatrix` and `PersistentCompactIntMatrix`, applying size-optimized branches for presence counts and direct accumulation for small groups. Additionally, add a `mask_with` trait method that efficiently zero-sets elements based on a mask, optimized for sparse masks with O(n_zeros) complexity. Include comprehensive tests covering overflow handling, slot masking, and result additivity across partitioned data.
2026-06-17 23:28:52 +02:00
Eric Coissac 93559c3294 feat: introduce unified column view types for bit and int matrices
This commit introduces `BitColView` and `IntColView` to abstract over Columnar and Packed storage formats, implementing `BitSlice` and `IntSlice` for uniform column access. It adds `col_view()` accessors to `PersistentBitMatrix` and `PackedCompactIntMatrix`, explicitly panicking on implicit variants. The new types are publicly re-exported, and unit tests are added to validate per-element retrieval, aggregation methods, and parity with the original columnar representation.
2026-06-17 23:24:11 +02:00
Eric Coissac 1f0d77d5bf docs: document compact vector implementation with Mermaid diagrams
Add Mermaid diagrams to visualize the trait hierarchy, compact int storage layout, and SIMD-vectorizable arithmetic operations for MemoryIntVec and PersistentCompactIntVec. Also document concrete type structures and planned layer/partition composition rules to improve documentation clarity.
2026-06-17 23:20:55 +02:00
Eric Coissac eeba43ac4f docs: add technical reference for obicompactvec module
Document the two-tier compact integer encoding, BitSlice/IntSlice trait hierarchy, and SIMD-friendly O(n+k) algorithms. Include details on concrete memory and persistent vector types, matrix aggregation traits, and planned group-filtering APIs.
2026-06-17 23:19:31 +02:00
Eric Coissac 7ed7b26039 perf: optimize vec arithmetic and add overflow tests
Refactor `cmp_scalar`, `min`, `max`, `add`, and `diff` to operate directly on the primary byte array, deferring overflow slot resolution to a secondary pass. This eliminates HashMap lookups in the hot path and enables SIMD vectorization. Add six unit tests to validate correct promotion and demotion between storage slots when values cross the 255 threshold.
2026-06-17 23:18:18 +02:00
Eric Coissac 26de90f18d feat: add iteration and aggregation to compact int vec
Implemented `sum()`, `count_nonzero()`, and `iter()` to complete the numeric vector interface. The builder now computes aggregate values across memory-mapped regions and overflow entries, while the reader delegates these operations to its inherent methods. The iterator provides zero-copy access to underlying `u32` elements.
2026-06-17 23:15:56 +02:00
Eric Coissac 497d250d8a refactor: replace byte-level bit iteration with 64-bit words
Refactor `BitIter` to process `u64` chunks using word-aligned shifts instead of byte-level operations. Introduce a dedicated `MemoryBitIter` for `MemoryBitVec`, updating its `iter()` and `IntoIterator` implementations accordingly. Hide `MemoryBitIter` from the public API to narrow the crate's interface, while leveraging explicit alignment guarantees for safer and more efficient bit extraction.
2026-06-17 23:11:12 +02:00
Eric Coissac aa98e82875 refactor: introduce PackedIntCol view and use iterators
Centralizes overflow handling and improves modularity by replacing manual mmap indexing and row loops with composable iterator patterns. This change leverages Rust's iterator traits for efficient, idiomatic column traversal while encapsulating data access in a dedicated view struct.
2026-06-17 23:09:18 +02:00
Eric Coissac 5ff5b04d2d refactor: replace manual bit ops with BitSlice traits
Refactors bit manipulation and distance calculations to leverage standardized `BitSlice` traits, replacing manual byte/word logic with safer, reusable methods. Extends `IntSlice` and `IntSliceMut` traits to expose direct memory-mapped access and overflow management, enabling efficient bulk data extraction and serialization. Replaces manual bit-shifting loops with optimized table-based unpacking and adds population count and distance metric methods for improved performance. Updates `PersistentBitVecBuilder` with file tracking and safe flushing, and aligns test imports with new trait bounds.
2026-06-17 15:28:44 +02:00
Eric Coissac df7b400fda perf: optimize aggregation with byte-level helpers and direct mmap
Introduce `byte_sum` and `byte_count_nonzero` to efficiently aggregate compact-int byte slices, bypassing per-element decoding and overflow map lookups. Refactor `sum()` and `count_nonzero()` across the matrix, reader, and traits modules to use direct memory-mapped slice iteration and idiomatic Rust iterators. Additionally, expose `MemoryIntIter` publicly and implement `IntoIterator` and `IntSlice` for `MemoryIntVec` to enable standard iteration and delegate aggregation to the new helpers.
2026-06-17 15:21:21 +02:00
Eric Coissac d1717688d2 refactor: extract matrix helpers and improve bit iteration ergonomics
Refactor parallel matrix construction by extracting reusable `pairwise_matrix` and `pairwise2_matrix` helpers, and consolidate binary record deserialization into dedicated parsing functions. Add `set` and `iter` methods to `BitSliceMut` and `MemoryBitVec` for ergonomic bit manipulation and iteration. Standardize JSON field extraction via `meta::field`, expose `MemoryBitIter`, and improve test reliability by automatically cleaning up temporary directories.
2026-06-17 15:15:39 +02:00
Eric Coissac cde6457eea feat: add memory vectors, slice traits, and column extraction methods
Introduce `MemoryBitVec` and `MemoryIntVec` for efficient in-memory storage with hybrid compression and overflow handling. Implement `BitSlice`, `BitSliceMut`, `IntSlice`, and `IntSliceMut` traits across persistent and memory-backed types to enable generic slice operations and bitwise/arithmetic overloads. Add `col_persist` and `col_as_memory` methods to `BitMatrix` and `IntMatrix` for efficient column extraction. Align with the new single-pass rebuild architecture by supporting fast kmer filtering and matrix rebuilding. Includes comprehensive tests and profiling instrumentation for the packing phase.
2026-06-17 15:03:18 +02:00
Eric Coissac b6fcbc545f refactor: replace rayon with NUMA-aware PartitionRunner
Replaces `rayon` parallel iteration across index, rebuild, reindex, and select modules with a custom `PartitionRunner`. This introduces NUMA-aware task distribution with CPU pinning and round-robin scheduling, eliminating `Arc`, `Mutex`, and atomic synchronization primitives in favor of a flat, pre-spawned worker architecture. Error handling is simplified via `.map_err()` and the `?` operator, while progress bar updates are decoupled into dedicated callbacks.
2026-06-15 18:53:31 +02:00
coissac 9578f991f4 Merge pull request 'Push pslsukyowzrp' (#32) from push-pslsukyowzrp into main
Reviewed-on: #32
2026-06-15 16:29:24 +00:00
Eric Coissac 1cd7916e06 refactor: replace rayon with NUMA-aware PartitionRunner
Replaces `rayon` parallel iteration across index, rebuild, reindex, and select modules with a custom `PartitionRunner`. This introduces NUMA-aware task distribution with CPU pinning and round-robin scheduling, eliminating `Arc`, `Mutex`, and atomic synchronization primitives in favor of a flat, pre-spawned worker architecture. Error handling is simplified via `.map_err()` and the `?` operator, while progress bar updates are decoupled into dedicated callbacks.
2026-06-15 18:29:04 +02:00