4 Commits

Author SHA1 Message Date
Eric Coissac 036d044291 refactor: update core types and add approximate evidence support
Refactor `Kmer`, `SuperKmer`, and chunk reader into optimized, generic representations with compile-time length parameters and bitwise operations. Update the pipeline and scheduler to support batch processing, 1→N flat transformations, and multi-source merging. Introduce an approximate evidence mode using b-bit fingerprints and `.idx` files, alongside existing exact mode. Update CLI documentation, minimizer selection, and query output schema accordingly.
2026-05-26 10:04:25 +02:00
Eric Coissac 27f5e88a7b refactor: implement RoutableSuperKmer and update k-mer indexing pipeline
Replace raw SuperkMer routing with a new RoutableSuperKimer type that embeds canonical sequences and precomputed minimizers, enabling direct partition routing via hash. Update the build pipeline to yield RoutableSuperKmers throughout (builder, scatterer), refactor FASTA/unitig export commands to use the new type and compressed outputs (.fasta.gz, .unitigs.fasta.zst), revise SuperKmer header to store n_kmers instead of seql (avoiding 256-byte wrap), and update documentation to reflect minimizer-based theory, two evidence-encoding strategies for unitig-MPHF indexing (global offset vs. ID+rank), and the new obipipeline library architecture with parallel workers, biased scheduling, and error handling.
2026-05-01 09:33:26 +02:00
Eric Coissac 58391886a3 🔧 Replace degenerate minimizer logic with hash-based random ordering
- Add `hash` field to MmerItem for stable, randomized minimizer ordering
- Introduce hash_mMER() using mix64 with XOR seed to avoid fixed points (e.g., poly-A/T)
- Remove is_degenerate() and minimizer_worse(), simplifying comparison to hash-only
- Update push logic: compare hashes instead of canonical values with degeneracy checks
2026-04-27 20:19:43 +02:00
Eric Coissac de3f9b16cf first implementation but far to be optimal 2026-04-19 12:17:16 +02:00