Compare commits

...

12 Commits

Author SHA1 Message Date
Eric Coissac
a786b58ed3 Dynamic Batch Flushing and Build Improvements
This release introduces dynamic batch flushing in the Distribute component, replacing the previous fixed-size batching with a memory- and count-aware strategy. Batches now flush automatically when either the maximum sequence count (BatchSizeMax()) or memory threshold (BatchMem()) per key is reached, ensuring more efficient resource usage and consistent behavior with the RebatchBySize strategy. The optional sizes parameter has been removed, and related code—including the Lua wrapper and worker buffer handling—has been updated for correctness and simplicity. Unused BatchSize() references have been eliminated from obidistribute.

Additionally, this release includes improvements to static Linux builds and overall build stability, enhancing reliability across deployment environments.
2026-03-16 22:06:51 +01:00
Eric Coissac
a2b26712b2 refactor: replace fixed batch size with dynamic flushing based on count and memory
Replace the old fixed batch-size mechanism in Distribute with a dynamic strategy that flushes batches when either BatchSizeMax() sequences or BatchMem() bytes are reached per key. This aligns with the RebatchBySize strategy and removes the optional sizes parameter. Also update related code: simplify Lua wrapper to accept optional capacity, and fix buffer growth logic in worker.go using slices.Grow correctly. Remove unused BatchSize() usage from obidistribute.
2026-03-16 22:06:44 +01:00
coissac
1599abc9ad Merge pull request #99 from metabarcoding/push-urlyqwkrqypt
4.4.28: Static Linux Builds, Memory-Aware Batching, and Build Stability
2026-03-14 12:21:34 +01:00
Eric Coissac
af213ab446 4.4.28: Static Linux Builds, Memory-Aware Batching, and Build Stability
This release focuses on improving build reliability, memory efficiency for large datasets, and portability of Linux binaries.

### Static Linux Binaries
- Linux binaries are now built with static linking using musl, eliminating external runtime dependencies and ensuring portability across distributions.

### Memory-Aware Batching
- Users can now control memory usage during processing with the new `--batch-mem` option, specifying limits such as 128K, 64M, or 1G.
- Batching logic now respects both size and memory constraints: batches are flushed when either threshold is exceeded.
- Conservative memory estimation for sequences helps avoid over-allocation, and explicit garbage collection after large batch discards reduces memory spikes.

### Build System Improvements
- Upgraded to Go 1.26 for improved performance and toolchain stability.
- Fixed cross-compilation issues by replacing generic include paths with architecture-specific ones (x86_64-linux-gnu and aarch64-linux-gnu).
- Streamlined macOS builds by removing special flags, using standard `make` targets.
- Enhanced error reporting during build failures: logs are now shown before cleanup and exit.
- Updated install script to correctly configure GOROOT, GOPATH, and GOTOOLCHAIN, with visual progress feedback for downloads.

All batching behavior is non-breaking and maintains backward compatibility while offering more predictable resource usage on large datasets.
2026-03-14 11:59:15 +01:00
Eric Coissac
a60184c115 chore: bump version to 4.4.27 and add zlib-static dependency
Update version to 4.4.27 in version.txt and pkg/obioptions/version.go.

Add zlib-static package to release workflow to ensure static linking of zlib, resolving potential runtime dependency issues with the external link mode.
2026-03-14 11:59:04 +01:00
Eric Coissac
585b024bf0 chore: update to Go 1.26 and refactor release workflow
- Upgrade Go version from 1.23 to 1.26 in release.yml
- Remove CGO_CFLAGS from cross-compilation matrix entries
- Replace Linux build tools installation with Docker-based static build using golang:1.26-alpine
- Simplify macOS build to use standard make without special flags
- Increment version to 4.4.26
2026-03-14 11:43:31 +01:00
Eric Coissac
afc9ffda85 chore: bump version to 4.4.25 and fix CGO_CFLAGS for cross-compilation
Update version to 4.4.25 in version.txt and pkg/obioptions/version.go.

Fix CGO_CFLAGS in release.yml by replacing generic '-I/usr/include' with architecture-specific paths (x86_64-linux-gnu and aarch64-linux-gnu) to ensure correct header inclusion during cross-compilation on Linux.
2026-03-13 19:30:29 +01:00
Eric Coissac
fdd972bbd2 fix: add CGO_CFLAGS for static Linux builds and update go.work.sum
- Add CGO_CFLAGS environment variable to release workflow for Linux builds
- Update go.work.sum with new golang.org/x/net v0.38.0 entry
- Remove obsolete logs archive file
2026-03-13 19:24:18 +01:00
coissac
76f595e1fe Merge pull request #95 from metabarcoding/push-kzmrqmplznrn
Version 4.4.24
2026-03-13 19:13:02 +01:00
coissac
1e1e5443e3 Merge branch 'master' into push-kzmrqmplznrn 2026-03-13 19:12:49 +01:00
coissac
58d685926b Merge pull request #94 from metabarcoding/push-lxxxlurqmqrt
4.4.23: Memory-aware batching, static Linux builds, and build improvements
2026-03-13 19:04:15 +01:00
Eric Coissac
e9f24426df 4.4.23: Memory-aware batching, static Linux builds, and build improvements
### Memory-Aware Batching
- Introduced configurable min/max batch size bounds and memory limits for precise resource control.
- Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G).
- Implemented `RebatchBySize()` to handle both byte and count limits, flushing when either threshold is exceeded.
- Added conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection for explicit cleanup after large batch discards.
- Updated internal batching logic across core modules to consistently apply default memory (128 MB) and size (min: 1, max: 2000) bounds.

### Linux Build Enhancements
- Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies.

### Build System & Toolchain Improvements
- Updated Go toolchain to 1.26.1 with corresponding dependency bumps (e.g., go-getoptions, gval, regexp2, go-json, progressbar, logrus, testify).
- Fixed Makefile to safely quote LDFLAGS for paths with spaces.
- Improved build error handling: on failure, logs are displayed before cleanup and exit.
- Updated install script to correctly set GOROOT, GOPATH, and GOTOOLCHAIN, ensuring GOPATH directory creation.
- Added progress bar to curl downloads in the install script for visual feedback during Go and OBITools4 downloads.

All batching behavior remains non-breaking, with consistent constraints improving predictability during large dataset processing.
2026-03-13 19:03:50 +01:00
10 changed files with 48 additions and 49 deletions

View File

@@ -16,7 +16,7 @@ jobs:
- name: Setup Go - name: Setup Go
uses: actions/setup-go@v5 uses: actions/setup-go@v5
with: with:
go-version: "1.23" go-version: "1.26"
- name: Checkout obitools4 project - name: Checkout obitools4 project
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: Run tests - name: Run tests
@@ -54,7 +54,7 @@ jobs:
- name: Setup Go - name: Setup Go
uses: actions/setup-go@v5 uses: actions/setup-go@v5
with: with:
go-version: "1.23" go-version: "1.26"
- name: Extract version from tag - name: Extract version from tag
id: get_version id: get_version
@@ -62,12 +62,6 @@ jobs:
TAG=${GITHUB_REF#refs/tags/Release_} TAG=${GITHUB_REF#refs/tags/Release_}
echo "version=$TAG" >> $GITHUB_OUTPUT echo "version=$TAG" >> $GITHUB_OUTPUT
- name: Install build tools (Linux)
if: runner.os == 'Linux'
run: |
sudo apt-get update -q
sudo apt-get install -y musl-tools zlib1g-dev
- name: Install build tools (macOS) - name: Install build tools (macOS)
if: runner.os == 'macOS' if: runner.os == 'macOS'
run: | run: |
@@ -75,20 +69,30 @@ jobs:
xcode-select --install 2>/dev/null || true xcode-select --install 2>/dev/null || true
xcode-select -p xcode-select -p
- name: Build binaries - name: Build binaries (Linux)
if: runner.os == 'Linux'
env:
VERSION: ${{ steps.get_version.outputs.version }}
run: |
docker run --rm \
-v "$(pwd):/src" \
-w /src \
-e VERSION="${VERSION}" \
golang:1.26-alpine \
sh -c "apk add --no-cache gcc musl-dev zlib-dev zlib-static make && \
make LDFLAGS='-linkmode=external -extldflags=-static' obitools"
mkdir -p artifacts
tar -czf artifacts/obitools4_${VERSION}_${{ matrix.output_name }}.tar.gz -C build .
- name: Build binaries (macOS)
if: runner.os == 'macOS'
env: env:
GOOS: ${{ matrix.goos }} GOOS: ${{ matrix.goos }}
GOARCH: ${{ matrix.goarch }} GOARCH: ${{ matrix.goarch }}
VERSION: ${{ steps.get_version.outputs.version }} VERSION: ${{ steps.get_version.outputs.version }}
CC: ${{ matrix.goos == 'linux' && 'musl-gcc' || '' }}
run: | run: |
if [ "$GOOS" = "linux" ]; then make obitools
make LDFLAGS='-linkmode=external -extldflags=-static' obitools
else
make obitools
fi
mkdir -p artifacts mkdir -p artifacts
# Create a single tar.gz with all binaries for this platform
tar -czf artifacts/obitools4_${VERSION}_${{ matrix.output_name }}.tar.gz -C build . tar -czf artifacts/obitools4_${VERSION}_${{ matrix.output_name }}.tar.gz -C build .
- name: Upload artifacts - name: Upload artifacts

View File

@@ -52,6 +52,7 @@ golang.org/x/image v0.6.0/go.mod h1:MXLdDR43H7cDJq5GEGXEVeeNhPgi+YYEQ2pC1byI1x0=
golang.org/x/mod v0.13.0 h1:I/DsJXRlw/8l/0c24sM9yb0T4z9liZTduXvdAWYiysY= golang.org/x/mod v0.13.0 h1:I/DsJXRlw/8l/0c24sM9yb0T4z9liZTduXvdAWYiysY=
golang.org/x/mod v0.13.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c= golang.org/x/mod v0.13.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/mod v0.14.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c= golang.org/x/mod v0.14.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw= golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4 h1:uVc8UZUe6tr40fFVnUP5Oj+veunVezqYl9z7DYw9xzw= golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4 h1:uVc8UZUe6tr40fFVnUP5Oj+veunVezqYl9z7DYw9xzw=
golang.org/x/text v0.13.0 h1:ablQoSUd0tRdKxZewP80B+BaqeKJuVhuRxj/dkrun3k= golang.org/x/text v0.13.0 h1:ablQoSUd0tRdKxZewP80B+BaqeKJuVhuRxj/dkrun3k=

Binary file not shown.

BIN
logs_60535302930.zip Normal file

Binary file not shown.

View File

@@ -57,34 +57,21 @@ func (dist *IDistribute) Classifier() *obiseq.BioSequenceClassifier {
} }
// Distribute organizes the biosequences from the iterator into batches // Distribute organizes the biosequences from the iterator into batches
// based on the provided classifier and batch sizes. It returns an // based on the provided classifier. It returns an IDistribute instance
// IDistribute instance that manages the distribution of the sequences. // that manages the distribution of the sequences.
// //
// Parameters: // Batches are flushed when either BatchSizeMax() sequences or BatchMem()
// - class: A pointer to a BioSequenceClassifier used to classify // bytes are accumulated per key, mirroring the RebatchBySize strategy.
// the biosequences during distribution. func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier) IDistribute {
// - sizes: Optional integer values specifying the batch size. If maxCount := obidefault.BatchSizeMax()
// no sizes are provided, a default batch size of 5000 is used. maxBytes := obidefault.BatchMem()
//
// Returns:
// An IDistribute instance that contains the outputs of the
// classified biosequences, a channel for new data notifications,
// and the classifier used for distribution. The method operates
// asynchronously, processing the sequences in separate goroutines.
// It ensures that the outputs are closed and cleaned up once
// processing is complete.
func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, sizes ...int) IDistribute {
batchsize := obidefault.BatchSize()
outputs := make(map[int]IBioSequence, 100) outputs := make(map[int]IBioSequence, 100)
slices := make(map[int]*obiseq.BioSequenceSlice, 100) slices := make(map[int]*obiseq.BioSequenceSlice, 100)
bufBytes := make(map[int]int, 100)
orders := make(map[int]int, 100) orders := make(map[int]int, 100)
news := make(chan int) news := make(chan int)
if len(sizes) > 0 {
batchsize = sizes[0]
}
jobDone := sync.WaitGroup{} jobDone := sync.WaitGroup{}
lock := sync.Mutex{} lock := sync.Mutex{}
@@ -115,6 +102,7 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
slice = &s slice = &s
slices[key] = slice slices[key] = slice
orders[key] = 0 orders[key] = 0
bufBytes[key] = 0
lock.Lock() lock.Lock()
outputs[key] = MakeIBioSequence() outputs[key] = MakeIBioSequence()
@@ -123,14 +111,20 @@ func (iterator IBioSequence) Distribute(class *obiseq.BioSequenceClassifier, siz
news <- key news <- key
} }
*slice = append(*slice, s) sz := s.MemorySize()
countFull := maxCount > 0 && len(*slice) >= maxCount
if len(*slice) == batchsize { memFull := maxBytes > 0 && bufBytes[key]+sz > maxBytes && len(*slice) > 0
if countFull || memFull {
outputs[key].Push(MakeBioSequenceBatch(source, orders[key], *slice)) outputs[key].Push(MakeBioSequenceBatch(source, orders[key], *slice))
orders[key]++ orders[key]++
s := obiseq.MakeBioSequenceSlice() s := obiseq.MakeBioSequenceSlice()
slices[key] = &s slices[key] = &s
slice = &s
bufBytes[key] = 0
} }
*slice = append(*slice, s)
bufBytes[key] += sz
} }
} }

View File

@@ -31,7 +31,8 @@ func obiseqslice2Lua(interpreter *lua.LState,
} }
func newObiSeqSlice(luaState *lua.LState) int { func newObiSeqSlice(luaState *lua.LState) int {
seqslice := obiseq.NewBioSequenceSlice() capacity := luaState.OptInt(1, 0)
seqslice := obiseq.NewBioSequenceSlice(capacity)
luaState.Push(obiseqslice2Lua(luaState, seqslice)) luaState.Push(obiseqslice2Lua(luaState, seqslice))
return 1 return 1
} }

View File

@@ -3,7 +3,7 @@ package obioptions
// Version is automatically updated by the Makefile from version.txt // Version is automatically updated by the Makefile from version.txt
// The patch number (third digit) is incremented on each push to the repository // The patch number (third digit) is incremented on each push to the repository
var _Version = "Release 4.4.24" var _Version = "Release 4.4.29"
// Version returns the version of the obitools package. // Version returns the version of the obitools package.
// //

View File

@@ -104,11 +104,11 @@ func SeqToSliceWorker(worker SeqWorker,
for _, s := range input { for _, s := range input {
r, err := worker(s) r, err := worker(s)
if err == nil { if err == nil {
if i+len(r) > cap(output) {
output = slices.Grow(output[:i], len(r))
output = output[:cap(output)]
}
for _, rs := range r { for _, rs := range r {
if i == len(output) {
output = slices.Grow(output, cap(output))
output = output[:cap(output)]
}
output[i] = rs output[i] = rs
i++ i++
} }

View File

@@ -46,8 +46,7 @@ func CLIDistributeSequence(sequences obiiter.IBioSequence) {
formater = obiformats.WriteSequencesToFile formater = obiformats.WriteSequencesToFile
} }
dispatcher := sequences.Distribute(CLISequenceClassifier(), dispatcher := sequences.Distribute(CLISequenceClassifier())
obidefault.BatchSize())
obiformats.WriterDispatcher(CLIFileNamePattern(), obiformats.WriterDispatcher(CLIFileNamePattern(),
dispatcher, formater, opts..., dispatcher, formater, opts...,

View File

@@ -1 +1 @@
4.4.24 4.4.29