refactor: implement RoutableSuperKmer and update k-mer indexing pipeline

Replace raw SuperkMer routing with a new RoutableSuperKimer type that embeds canonical sequences and precomputed minimizers, enabling direct partition routing via hash. Update the build pipeline to yield RoutableSuperKmers throughout (builder, scatterer), refactor FASTA/unitig export commands to use the new type and compressed outputs (.fasta.gz, .unitigs.fasta.zst), revise SuperKmer header to store n_kmers instead of seql (avoiding 256-byte wrap), and update documentation to reflect minimizer-based theory, two evidence-encoding strategies for unitig-MPHF indexing (global offset vs. ID+rank), and the new obipipeline library architecture with parallel workers, biased scheduling, and error handling.
This commit is contained in:
Eric Coissac
2026-04-29 22:52:42 +02:00
parent 4e26e3bd40
commit 27f5e88a7b
72 changed files with 10093 additions and 1626 deletions
+8
View File
@@ -1590,6 +1590,9 @@ name = "obikmer"
version = "0.1.0"
dependencies = [
"clap",
"memmap2",
"niffler 3.0.0",
"obidebruinj",
"obifastwrite",
"obikpartitionner",
"obikrope",
@@ -1597,7 +1600,10 @@ dependencies = [
"obipipeline",
"obiread",
"obiskbuilder",
"obiskio",
"ph",
"pprof",
"rayon",
"tracing",
"tracing-subscriber",
]
@@ -1633,6 +1639,8 @@ version = "0.1.0"
dependencies = [
"bitvec",
"criterion2",
"serde",
"serde_json",
"xxhash-rust",
]