Added obisys as a local dependency and integrated its Reporter and Stage instrumentation into the partition command. Each major phase (scatter, dereplicate, and kmer-counting) is now wrapped in timing blocks, with aggregated execution times printed to stdout upon completion.
Replace in-memory hashing with a disk-backed external merge sort and `PersistentCompactIntVec` to drastically reduce peak RAM. Unify both phases using a custom `PtrHash` MPHF, eliminating `GOFunction` and `boomphf`. Introduce a concrete three-step `count_partition()` pipeline with adaptive chunk sizing based on available system memory. Update dependencies to `memmap2`, `ptr_hash`, and `obicompactvec`. Additionally, document strict genomics-only memory constraints and enforce an architectural feedback workflow requiring explicit user authorization before structural changes.
Introduces ColumnWeights, CountPartials, and BitPartials traits to compute and finalize partial distance matrices. Implements these traits for PersistentBitMatrix, PersistentCompactIntMatrix, and a new LayeredStore<S> wrapper that aggregates metrics across layers via parallel reduction. Adds ndarray for numerical aggregation and updates architecture documentation to reflect the trait-driven design and pending refactoring roadmap.
Introduce parallel distance matrix generation using `ndarray` and `rayon` for both `BitMatrix` and `IntMatrix`. Adds full and additive-partial variants for Jaccard, Hamming, Bray-Curtis, Euclidean, and Hellinger metrics. Includes comprehensive unit tests verifying matrix symmetry, zero diagonals, and numerical correctness against pairwise calculations.
Replace the hardcoded `Counts` module with a generic `LayerData` trait, parameterizing `Layer` and `LayeredMap` over arbitrary payload types. This decouples read-path access from build-path logic, enabling both set membership and count-based indexing via `PersistentCompactIntVec`. Adds the `obicompactvec` dependency, implements streaming layer construction, and expands test coverage for persistence and multi-layer resolution.
Introduce the `obicompactvec` crate, featuring a two-tier, memory-mapped integer vector that uses a primary `u8` array with a sentinel for overflow dispatch and a sparse L1-resident index for fast random access. Implement builder and reader modules with zero-copy serialization and comprehensive test coverage. Update `obilayeredmap` to replace the default hash function with a cache-line-optimized `Mphf`, adding explicit bounds checking and duplicate-slot detection. Add documentation for both modules and update project configuration files accordingly.
Replace `ph` with `ptr_hash` and introduce `epserde` and `rayon` dependencies. Refactor MPHF construction to leverage parallel iteration, eliminating intermediate `Vec<u64>` allocations and reducing memory footprint. Add a `n_kmers` field to track and serialize total kmer counts, alongside three zero-allocation iterators for efficient chunk traversal. Include comprehensive unit tests for the new iterators and update CLAUDE.md to enforce explicit dependency validation policies.
Introduces the `obilayeredmap` crate (v0.1.0), implementing an append-only, disk-backed k-mer index using a minimal perfect hash function (MPHF). The module features memory-mapped reads, buffered writes, custom error handling, partition metadata persistence, and comprehensive unit tests. Also adds a reverse complement benchmark for `obikseq` and updates `Cargo.lock` with the new dependencies.
Introduce the `obilayeredmap` specification and persistent MPHF-based index architecture for incremental multi-dataset indexing. Implement chunked binary serialization with a fixed `u8` k-mer count limit (256) and overlapping super-kmer segments. Add memory-mapped I/O and a companion `.idx` index file for allocation-free, O(1) unitig access. Update MkDocs navigation, enhance the k-mer comparison script, and add comprehensive tests for serialization, partitioning, and file I/O pipelines.
Centralize k-mer and minimizer configuration using a thread-safe global module, and replace manual bit-packing with a memory-efficient `PackedSeq` type. Refactor core sequence and k-mer types to use compile-time length enforcement and centralized hashing. Introduce a new De Bruijn graph implementation with compact node encoding and traversal iterators. Update I/O, partitioning, and builder modules to align with the new architecture, and add the `xxhash-rust` dependency.
Replace raw SuperkMer routing with a new RoutableSuperKimer type that embeds canonical sequences and precomputed minimizers, enabling direct partition routing via hash. Update the build pipeline to yield RoutableSuperKmers throughout (builder, scatterer), refactor FASTA/unitig export commands to use the new type and compressed outputs (.fasta.gz, .unitigs.fasta.zst), revise SuperKmer header to store n_kmers instead of seql (avoiding 256-byte wrap), and update documentation to reflect minimizer-based theory, two evidence-encoding strategies for unitig-MPHF indexing (global offset vs. ID+rank), and the new obipipeline library architecture with parallel workers, biased scheduling, and error handling.
- Remove redundant validation logic in login handler
- Consolidate session token generation into a single utility function
- Update error handling to use consistent HTTP status codes
- Replace `limits` module and raw binary I/O with a new high-level abstraction using obiskio::SKFileWriter
- Remove `niffler` dependency and compression logic (Gzip/Zstd/Lz4/Bgzf)
- Simplify PartitionManager to manage partitioned file writers based on kmer hashing
* Uses `n_partition_bits` for bitmask-based partition selection (2^n partitions)
- Add obiskio as a local dependency
Note: This is likely part of aligning with unified I/O primitives in the obiskio crate.
- Add new obiskio crate for high-performance SuperKmer serialization/deserialization
- Implement binary codec with 2-bit packed sequence encoding and raw header format (32 bits)
- Add transparent compression support via niffler: Zstd, Gzip/Bgzf/Lz4
- Implement SKFilePool with LRU-based fd management, max-concurrent-fd limiting (75% of ulimit)
- Add SKFileWriter with batched writes, configurable flush threshold (8 KiB default), and two-phase locking
- Add SKFileReader with sequential access, LRU recovery via reopen_and_seek()
+ New obikpartitionner crate: basic header/seq handling for binary super-kmer format
- Bump niffler from 2.7 to v3, add dependencies: allocator-api2, bitflags(>=1), errno/fastrand/rustix/tempfile/lru/hashbrown/bzip2/thiserror
- Update workspace members to include obikpartitionner andobiskio
- Replace crossbeam-channel-based threading model
- Introduce obipipeline crate with Stage::Transform/Flat support
- Replace single input + format detection by multiple inputs via PathIter
- Implement pipeline stages: open_chunks → normalize → build_superkmers (flat) + write_batch
- Add SharedFlatFn for 1→N transformations with delta tracking in scheduler loop
- Added `regex` dependency to obiread crate
- Replaced manual byte checks with regex-based detection for FASTA/FASTQ formats in mimetype.rs
- Switched from `once_cell::sync::Lazy` to standard library's `std:: sync :: LazyLock`
- Added generic text/plain fallback detection for ASCII-compatible content
- Updated `MimeTypeGuesser::new` constructor call syntax and simplified API usage of PeekReader's header method
- Implemented `Read trait for MimeTypeGuesser to allow transparent passthrough reading
- Introduce `obipipline` crate with multi-threaded data pipeline architecture
- Implement core types: SourceFn, SharedFun (Arc), SinkFN with biased scheduler and crossbeam channels
- Add macros: `make_source!`, `transform!/fallible`/sink!, and high-level DSL macro
- Replace old wrapper/error modules with unified scheduler module (renamed types, improved error variants)
- Update workspace: add `obipipeline` member to Cargo.toml and lockfile
- Document pipeline in docmd/implementation with full architecture, error handling & example
- Refactor sandbox_pipeline.rs to use new DSL instead of manual channel wiring
- Introduce new `obipackage` library with pipeline stages, scheduler and worker pool
- Refactor path expansion in `obiread`: replace old list_of_files with new PathIter iterator
- Add MIME type detection using `infer` crate (fastq/fasta)
- Update dependencies in Cargo.lock: add bumpalo, byteorder, cfb (with deps), fnv,
infer, js-sys/uuid/wasm-bindgen ecosystem
- Fix formatting and improve tests in SuperKmer (canonical, revcomp)
* Note: edition = "2024" in obipipeline/Cargo.toml is invalid; should be 2021
- Introduce lazy_static dependency
- Refactor encoding: rename encode_base →encode_nuc and make it pub(crate)
- Add from_raw_right/raw Right methods to Kmer for right-aligned handling
- Improve error message formatting and code readability in kmod.rs tests
- Replace inline entropy computation with precomputed tables (entropy_table module)—using LazyLock for static lookup arrays
- Simplify EntropyFilter by removing redundant tables and delegating to new entropy_table API
- Add RollingStat module for real-time kmer statistics and minimizer tracking
- Reorganize modules: move iter, encoding to pub(crate), add entropy_table and rolling_stat
- Update imports across obiskbuilder crate accordingly