feat: enhance memory budgeting and add rebuild diagnostics
This commit improves memory management by respecting Linux cgroup v1/v2 limits and introduces a configurable memory budget for the new `rebuild` subcommand to prevent OOM during index reconstruction. The rebuild process now supports filtering, compaction, and parallelization. Diagnostic capabilities are expanded with debug-level tracing for partition merges, k-mer expansion tracking, and utility flags for label renaming, matrix size breakdowns, per-genome counts, and partition distribution reporting. Accessor methods for active and remaining memory have also been added to the stats struct.
This commit is contained in:
@@ -51,7 +51,13 @@ Non-ACGT characters act as hard breaks between k-mer segments in all formats.
|
||||
Runs scatter → dereplicate → count → layered MPHF.
|
||||
Resumes automatically if interrupted.
|
||||
merge Merge multiple independently built indexes into one.
|
||||
rebuild Filter and compact an existing index: apply count thresholds,
|
||||
Schedules partitions largest-first under a memory budget semaphore
|
||||
to avoid OOM on machines with many cores. The worst partition runs
|
||||
alone first to calibrate the expansion estimator; subsequent
|
||||
partitions run in parallel within the budget.
|
||||
--budget-fraction F fraction of available RAM to use as budget
|
||||
(default 0.5; reduce if OOM persists).
|
||||
filter Filter and compact an existing index: apply count thresholds,
|
||||
drop layers, rewrite as a single-layer index.
|
||||
reindex Convert evidence in-place across all layers:
|
||||
exact (evidence.bin) ↔ approximate (fingerprint.bin).
|
||||
@@ -74,7 +80,14 @@ Non-ACGT characters act as hard breaks between k-mer segments in all formats.
|
||||
Diagnostic / pipeline use.
|
||||
unitig Dump the unitig sequences stored in a built index. Debug use.
|
||||
utils Miscellaneous utilities.
|
||||
--new-label NEW=OLD renames a genome label in-place.
|
||||
--new-label NEW=OLD rename a genome label in-place.
|
||||
--bits-per-kmer print MPHF / evidence / matrix size breakdown.
|
||||
--stats per-genome k-mer counts as CSV.
|
||||
--partition-stats partition size distribution across one or more
|
||||
indexes (markdown report to stdout). Useful to
|
||||
diagnose minimizer imbalance before a large merge.
|
||||
--csv FILE write per-(partition, source) raw data to FILE
|
||||
(used with --partition-stats).
|
||||
|
||||
## Quick start
|
||||
|
||||
|
||||
Reference in New Issue
Block a user