refactor: replace single batch size with min/max bounds and memory limits

Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.
This commit is contained in:
Eric Coissac
2026-03-13 15:07:31 +01:00
parent 40769bf827
commit 1e1f575d1c
4 changed files with 38 additions and 16 deletions

View File

@@ -1,6 +1,12 @@
package obidefault
var _BatchSize = 2000
// _BatchSize is the minimum number of sequences per batch (floor).
// Used as the minSeqs argument to RebatchBySize.
var _BatchSize = 1
// _BatchSizeMax is the maximum number of sequences per batch (ceiling).
// A batch is flushed when this count is reached regardless of memory usage.
var _BatchSizeMax = 2000
// SetBatchSize sets the size of the sequence batches.
//
@@ -25,10 +31,19 @@ func BatchSizePtr() *int {
return &_BatchSize
}
// BatchSizeMax returns the maximum number of sequences per batch.
func BatchSizeMax() int {
return _BatchSizeMax
}
func BatchSizeMaxPtr() *int {
return &_BatchSizeMax
}
// _BatchMem holds the maximum cumulative memory (in bytes) per batch when
// memory-based batching is requested. A value of 0 disables memory-based
// batching and falls back to count-based batching.
var _BatchMem = 0
var _BatchMem = 128 * 1024 * 1024 // 128 MB default; set to 0 to disable
var _BatchMemStr = ""
// SetBatchMem sets the memory budget per batch in bytes.