obikmer

Author	SHA1	Message	Date
Eric Coissac	13e69e23c9	feat: introduce trait-based distance aggregation and layered store Introduces ColumnWeights, CountPartials, and BitPartials traits to compute and finalize partial distance matrices. Implements these traits for PersistentBitMatrix, PersistentCompactIntMatrix, and a new LayeredStore<S> wrapper that aggregates metrics across layers via parallel reduction. Adds ndarray for numerical aggregation and updates architecture documentation to reflect the trait-driven design and pending refactoring roadmap.	2026-05-16 14:41:49 +08:00
Eric Coissac	8bee9f3017	feat: add parallel distance matrix computation for bit and int matrices Introduce parallel distance matrix generation using `ndarray` and `rayon` for both `BitMatrix` and `IntMatrix`. Adds full and additive-partial variants for Jaccard, Hamming, Bray-Curtis, Euclidean, and Hellinger metrics. Includes comprehensive unit tests verifying matrix symmetry, zero diagonals, and numerical correctness against pairwise calculations.	2026-05-15 17:23:12 +08:00
Eric Coissac	f48f7500cd	refactor(obilayeredmap): support generic payload types Replace the hardcoded `Counts` module with a generic `LayerData` trait, parameterizing `Layer` and `LayeredMap` over arbitrary payload types. This decouples read-path access from build-path logic, enabling both set membership and count-based indexing via `PersistentCompactIntVec`. Adds the `obicompactvec` dependency, implements streaming layer construction, and expands test coverage for persistence and multi-layer resolution.	2026-05-14 09:33:18 +08:00
Eric Coissac	f2de79acde	Add persistent compact integer vector and cache-line-optimized MPHF Introduce the `obicompactvec` crate, featuring a two-tier, memory-mapped integer vector that uses a primary `u8` array with a sentinel for overflow dispatch and a sparse L1-resident index for fast random access. Implement builder and reader modules with zero-copy serialization and comprehensive test coverage. Update `obilayeredmap` to replace the default hash function with a cache-line-optimized `Mphf`, adding explicit bounds checking and duplicate-slot detection. Add documentation for both modules and update project configuration files accordingly.	2026-05-13 10:09:46 +08:00
Eric Coissac	ff75c9198d	feat: add kmer iterators and optimize layered map performance Replace `ph` with `ptr_hash` and introduce `epserde` and `rayon` dependencies. Refactor MPHF construction to leverage parallel iteration, eliminating intermediate `Vec<u64>` allocations and reducing memory footprint. Add a `n_kmers` field to track and serialize total kmer counts, alongside three zero-allocation iterators for efficient chunk traversal. Include comprehensive unit tests for the new iterators and update CLAUDE.md to enforce explicit dependency validation policies.	2026-05-12 22:35:21 +08:00
Eric Coissac	9c41891cc8	feat: add obilayeredmap crate for disk-backed k-mer indexing Introduces the `obilayeredmap` crate (v0.1.0), implementing an append-only, disk-backed k-mer index using a minimal perfect hash function (MPHF). The module features memory-mapped reads, buffered writes, custom error handling, partition metadata persistence, and comprehensive unit tests. Also adds a reverse complement benchmark for `obikseq` and updates `Cargo.lock` with the new dependencies.	2026-05-12 15:26:39 +08:00
Eric Coissac	5169f65dc9	feat: implement persistent layered index and chunked binary format Introduce the `obilayeredmap` specification and persistent MPHF-based index architecture for incremental multi-dataset indexing. Implement chunked binary serialization with a fixed `u8` k-mer count limit (256) and overlapping super-kmer segments. Add memory-mapped I/O and a companion `.idx` index file for allocation-free, O(1) unitig access. Update MkDocs navigation, enhance the k-mer comparison script, and add comprehensive tests for serialization, partitioning, and file I/O pipelines.	2026-05-09 17:38:29 +08:00
Eric Coissac	8c17bf958b	refactor: centralize k-mer config and introduce packed sequences Centralize k-mer and minimizer configuration using a thread-safe global module, and replace manual bit-packing with a memory-efficient `PackedSeq` type. Refactor core sequence and k-mer types to use compile-time length enforcement and centralized hashing. Introduce a new De Bruijn graph implementation with compact node encoding and traversal iterators. Update I/O, partitioning, and builder modules to align with the new architecture, and add the `xxhash-rust` dependency.	2026-05-08 06:34:24 +08:00
Eric Coissac	27f5e88a7b	refactor: implement RoutableSuperKmer and update k-mer indexing pipeline Replace raw SuperkMer routing with a new RoutableSuperKimer type that embeds canonical sequences and precomputed minimizers, enabling direct partition routing via hash. Update the build pipeline to yield RoutableSuperKmers throughout (builder, scatterer), refactor FASTA/unitig export commands to use the new type and compressed outputs (.fasta.gz, .unitigs.fasta.zst), revise SuperKmer header to store n_kmers instead of seql (avoiding 256-byte wrap), and update documentation to reflect minimizer-based theory, two evidence-encoding strategies for unitig-MPHF indexing (global offset vs. ID+rank), and the new obipipeline library architecture with parallel workers, biased scheduling, and error handling.	2026-05-01 09:33:26 +02:00
Eric Coissac	4e26e3bd40	Refactor: Simplify user authentication flow - Remove redundant password validation logic - Integrate JWT-based session management for improved security and scalability	2026-04-30 07:04:03 +02:00
Eric Coissac	ebbfe35cbc	Refactor: Extract utility function for string reversal Extracted `inverser_chaine` into a reusable utility function with docstring and added unit test to ensure correctness.	2026-04-30 06:58:46 +02:00
Eric Coissac	e7fa60a3a2	Refactor: Simplify user authentication flow - Remove redundant validation logic in login handler - Consolidate session token generation into a single utility function - Update error handling to use consistent HTTP status codes	2026-04-28 08:38:26 +02:00
Eric Coissac	7efec54b27	.gitignore: ignore zstandard-compressed files - Add *.zst pattern to .gitignore - Prevents tracking of zstandard-compressed archives	2026-04-27 16:56:15 +02:00
Eric Coissac	eaf893174f	♻️ refactor(obikpartitionner): replace low-level I/O with obiskio::SKFileWriter - Replace `limits` module and raw binary I/O with a new high-level abstraction using obiskio::SKFileWriter - Remove `niffler` dependency and compression logic (Gzip/Zstd/Lz4/Bgzf) - Simplify PartitionManager to manage partitioned file writers based on kmer hashing * Uses `n_partition_bits` for bitmask-based partition selection (2^n partitions) - Add obiskio as a local dependency Note: This is likely part of aligning with unified I/O primitives in the obiskio crate.	2026-04-26 15:00:12 +02:00
Eric Coissac	c09d17401d	+ obiskio: add binary I/O with LRU pool and compression - Add new obiskio crate for high-performance SuperKmer serialization/deserialization - Implement binary codec with 2-bit packed sequence encoding and raw header format (32 bits) - Add transparent compression support via niffler: Zstd, Gzip/Bgzf/Lz4 - Implement SKFilePool with LRU-based fd management, max-concurrent-fd limiting (75% of ulimit) - Add SKFileWriter with batched writes, configurable flush threshold (8 KiB default), and two-phase locking - Add SKFileReader with sequential access, LRU recovery via reopen_and_seek() + New obikpartitionner crate: basic header/seq handling for binary super-kmer format - Bump niffler from 2.7 to v3, add dependencies: allocator-api2, bitflags(>=1), errno/fastrand/rustix/tempfile/lru/hashbrown/bzip2/thiserror - Update workspace members to include obikpartitionner andobiskio	2026-04-25 14:15:01 +02:00
Eric Coissac	d4e4289aff	(feat): refactor superkmer to use obipipeline with flat transforms - Replace crossbeam-channel-based threading model - Introduce obipipeline crate with Stage::Transform/Flat support - Replace single input + format detection by multiple inputs via PathIter - Implement pipeline stages: open_chunks → normalize → build_superkmers (flat) + write_batch - Add SharedFlatFn for 1→N transformations with delta tracking in scheduler loop	2026-04-24 21:08:09 +02:00
Eric Coissac	75bf980046	(deps) Add regex crate and improve MIME type detection - Added `regex` dependency to obiread crate - Replaced manual byte checks with regex-based detection for FASTA/FASTQ formats in mimetype.rs - Switched from `once_cell::sync::Lazy` to standard library's `std:: sync :: LazyLock` - Added generic text/plain fallback detection for ASCII-compatible content - Updated `MimeTypeGuesser::new` constructor call syntax and simplified API usage of PeekReader's header method - Implemented `Read trait for MimeTypeGuesser to allow transparent passthrough reading	2026-04-24 17:16:17 +02:00
Eric Coissac	22951fb0e8	🔖 Add obipipeline parallel pipeline library - Introduce `obipipline` crate with multi-threaded data pipeline architecture - Implement core types: SourceFn, SharedFun (Arc), SinkFN with biased scheduler and crossbeam channels - Add macros: `make_source!`, `transform!/fallible`/sink!, and high-level DSL macro - Replace old wrapper/error modules with unified scheduler module (renamed types, improved error variants) - Update workspace: add `obipipeline` member to Cargo.toml and lockfile - Document pipeline in docmd/implementation with full architecture, error handling & example - Refactor sandbox_pipeline.rs to use new DSL instead of manual channel wiring	2026-04-24 17:10:07 +02:00
Eric Coissac	3f8880a7e5	📦 Add infer and new pipeline infrastructure - Update Cargo.lock with dependency additions (bumpalo, byteorder, cfb, fnv, infer, js-sys, uuid wasm-bindgen) - Refactor obikseq::superkmer: reorder imports and improve formatting - Add `obipipeline` crate with scheduler, error handling & macros (WIP) - Replace obiread::expand_paths logic with PathIter and path_iterator module - Add mimetype detection using `infer` crate via PeekReader wrapper	2026-04-23 21:06:11 +02:00
Eric Coissac	664d0216b5	📦 Add obipipeline crate and refactor path handling - Introduce new `obipackage` library with pipeline stages, scheduler and worker pool - Refactor path expansion in `obiread`: replace old list_of_files with new PathIter iterator - Add MIME type detection using `infer` crate (fastq/fasta) - Update dependencies in Cargo.lock: add bumpalo, byteorder, cfb (with deps), fnv, infer, js-sys/uuid/wasm-bindgen ecosystem - Fix formatting and improve tests in SuperKmer (canonical, revcomp) * Note: edition = "2024" in obipipeline/Cargo.toml is invalid; should be 2021	2026-04-23 21:06:11 +02:00
Eric Coissac	ae5e1152b9	(feat) Add entropy-based filtering and rolling statistics for k-mers - Introduce lazy_static dependency - Refactor encoding: rename encode_base →encode_nuc and make it pub(crate) - Add from_raw_right/raw Right methods to Kmer for right-aligned handling - Improve error message formatting and code readability in kmod.rs tests - Replace inline entropy computation with precomputed tables (entropy_table module)—using LazyLock for static lookup arrays - Simplify EntropyFilter by removing redundant tables and delegating to new entropy_table API - Add RollingStat module for real-time kmer statistics and minimizer tracking - Reorganize modules: move iter, encoding to pub(crate), add entropy_table and rolling_stat - Update imports across obiskbuilder crate accordingly	2026-04-20 15:36:02 +02:00
Eric Coissac	41095a40d0	Refactor: simplify logic and fix edge case - Replaced redundant conditional checks with a single guard clause - Added unit test for edge case handling null input	2026-04-19 21:55:48 +02:00
Eric Coissac	0dcb5dd6c2	♻️ refactor rope implementation to use obikrope - rename `obirope` → `obikroper` - replace legacy rope with new in-place, Cell-based implementation - add ForwardCursor/Backward Cursor & SeekMode support (no more BytesMut) - update all dependents: - obiread: switch to Rope + cursors, remove tape.rs • chunk iterator yields `Rope` instead of Vec<Bytes> - obiskbuilder: use ForwardCursor over Rope - remove bytes dependency from affected crates	2026-04-19 21:23:10 +02:00
Eric Coissac	de3f9b16cf	first implementation but far to be optimal	2026-04-19 12:17:16 +02:00

24 Commits