obitools4

mirror of https://github.com/metabarcoding/obitools4.git synced 2026-06-24 01:31:00 +00:00

Author	SHA1	Message	Date
Eric Coissac	585b024bf0	chore: update to Go 1.26 and refactor release workflow - Upgrade Go version from 1.23 to 1.26 in release.yml - Remove CGO_CFLAGS from cross-compilation matrix entries - Replace Linux build tools installation with Docker-based static build using golang:1.26-alpine - Simplify macOS build to use standard make without special flags - Increment version to 4.4.26	2026-03-14 11:43:31 +01:00
Eric Coissac	afc9ffda85	chore: bump version to 4.4.25 and fix CGO_CFLAGS for cross-compilation Update version to 4.4.25 in version.txt and pkg/obioptions/version.go. Fix CGO_CFLAGS in release.yml by replacing generic '-I/usr/include' with architecture-specific paths (x86_64-linux-gnu and aarch64-linux-gnu) to ensure correct header inclusion during cross-compilation on Linux.	2026-03-13 19:30:29 +01:00
Eric Coissac	fdd972bbd2	fix: add CGO_CFLAGS for static Linux builds and update go.work.sum - Add CGO_CFLAGS environment variable to release workflow for Linux builds - Update go.work.sum with new golang.org/x/net v0.38.0 entry - Remove obsolete logs archive file	2026-03-13 19:24:18 +01:00
coissac	76f595e1fe	Merge pull request #95 from metabarcoding/push-kzmrqmplznrn Version 4.4.24	2026-03-13 19:13:02 +01:00
coissac	1e1e5443e3	Merge branch 'master' into push-kzmrqmplznrn	2026-03-13 19:12:49 +01:00
Eric Coissac	15d1f1fd80	Version 4.4.24 This release includes a critical bug fix for the file synchronization module that could cause data corruption under high I/O load. Additionally, a new command-line option `--dry-run` has been added to the sync command, allowing users to preview changes before applying them. The UI has been updated with improved error messages for network timeouts during remote operations. Release_4.4.24	2026-03-13 19:11:58 +01:00
Eric Coissac	8df2cbe22f	Bump version to 4.4.23 and update release workflow - Update version from 4.4.22 to 4.4.23 in version.txt and pkg/obioptions/version.go - Add zlib1g-dev dependency to Linux release workflow for potential linking requirements - Improve tag creation in Makefile by resolving commit hash with `jj log` for better CI/CD integration	2026-03-13 19:11:55 +01:00
coissac	58d685926b	Merge pull request #94 from metabarcoding/push-lxxxlurqmqrt 4.4.23: Memory-aware batching, static Linux builds, and build improvements	2026-03-13 19:04:15 +01:00
Eric Coissac	e9f24426df	4.4.23: Memory-aware batching, static Linux builds, and build improvements ### Memory-Aware Batching - Introduced configurable min/max batch size bounds and memory limits for precise resource control. - Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G). - Implemented `RebatchBySize()` to handle both byte and count limits, flushing when either threshold is exceeded. - Added conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection for explicit cleanup after large batch discards. - Updated internal batching logic across core modules to consistently apply default memory (128 MB) and size (min: 1, max: 2000) bounds. ### Linux Build Enhancements - Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies. ### Build System & Toolchain Improvements - Updated Go toolchain to 1.26.1 with corresponding dependency bumps (e.g., go-getoptions, gval, regexp2, go-json, progressbar, logrus, testify). - Fixed Makefile to safely quote LDFLAGS for paths with spaces. - Improved build error handling: on failure, logs are displayed before cleanup and exit. - Updated install script to correctly set GOROOT, GOPATH, and GOTOOLCHAIN, ensuring GOPATH directory creation. - Added progress bar to curl downloads in the install script for visual feedback during Go and OBITools4 downloads. All batching behavior remains non-breaking, with consistent constraints improving predictability during large dataset processing.	2026-03-13 19:03:50 +01:00
Eric Coissac	2f7be10b5d	Build improvements and Go version update - Update Go version from 1.25.0 to 1.26.1 in go.mod and go.work - Fix Makefile: quote LDFLAGS to handle spaces safely in -ldflags - Improve build error handling: on failure, cat log then cleanup and exit with error code - Update install_obitools.sh: properly set GOROOT, GOPATH, and GOTOOLCHAIN; ensure GOPATH directory is created Release_4.4.23	2026-03-13 19:03:42 +01:00
Eric Coissac	43125f9f5e	feat: add progress bar to curl downloads in install script Replace silent curl commands with --progress-bar option to provide visual feedback during Go and OBITools4 downloads, improving user experience without changing download logic.	2026-03-13 16:40:55 +01:00
Eric Coissac	c23368e929	update dependencies and Go toolchain to 1.25.0 Update go.mod and go.work to Go 1.25.0, bump several direct dependencies (e.g., go-getoptions, gval, regexp2, go-json, progressbar, logrus, testify), update indirect dependencies accordingly, and remove obsolete toolchain directive.	2026-03-13 16:09:34 +01:00
coissac	6cb5a81685	Merge pull request #93 from metabarcoding/push-snmwxkwkqxrm Memory-aware Batching and Static Linux Builds	2026-03-13 15:18:29 +01:00
Eric Coissac	94b0887069	Memory-aware Batching and Static Linux Builds ### Memory-Aware Batching - Replaced single batch size limits with configurable min/max bounds and memory limits for more precise control over resource usage. - Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G). - Introduced `RebatchBySize()` with explicit support for both byte and count limits, flushing when either threshold is exceeded. - Implemented conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection to trigger explicit cleanup after large batch discards. - Updated internal batching logic across `batchiterator.go`, `fragment.go`, and `obirefidx.go` to consistently use default memory (128 MB) and size (min: 1, max: 2000) bounds. ### Linux Build Enhancements - Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies. ### Notes - This release consolidates and improves batching behavior introduced in 4.4.20, with no breaking changes to the public API. - All user-facing batching behavior is now governed by consistent memory and count constraints, improving predictability and stability during large dataset processing.	2026-03-13 15:16:41 +01:00
Eric Coissac	c188580aac	Replace Rebatch with RebatchBySize using default batch parameters Replace calls to Rebatch(size) with RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax()) in batchiterator.go, fragment.go, and obirefidx.go to ensure consistent use of default memory and size limits for batch rebatching. Release_4.4.22	2026-03-13 15:16:33 +01:00
Eric Coissac	1e1f575d1c	refactor: replace single batch size with min/max bounds and memory limits Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.	2026-03-13 15:07:35 +01:00
Eric Coissac	40769bf827	Add memory-based batching support Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes: - Added _BatchMem and related getters/setters in pkg/obidefault - Implemented RebatchBySize() in pkg/obiter for memory-constrained batching - Added BioSequence.MemorySize() for conservative memory estimation - Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G) - Added obiutils.ParseMemSize/FormatMemSize for unit conversion - Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards - Updated sequence_reader.go to apply memory-based rebatching when enabled	2026-03-13 14:54:21 +01:00
Eric Coissac	74e6fcaf83	feat: add static linking for Linux builds using musl Enable static linking for Linux binaries by installing musl-tools and passing appropriate LDFLAGS during build. This ensures portable, self-contained executables for Linux targets.	2026-03-13 14:26:31 +01:00
coissac	30ec8b1b63	Merge pull request #92 from metabarcoding/push-mvpuxnxoyypu 4.4.21: Parallel builds, robust installation, and rope-based parsing enhancements	2026-03-13 12:00:32 +01:00
Eric Coissac	cdc72c5346	4.4.21: Parallel builds, robust installation, and rope-based parsing enhancements This release introduces significant improvements to build reliability and performance, alongside key parsing enhancements for sequence data. ### Build & Installation Improvements - Added support for parallel compilation via `-j/--jobs` option in both the Makefile and install script, enabling faster builds on multi-core systems. The default remains single-threaded for safety. - Enhanced Makefile with `.DEFAULT_GOAL := all` for consistent behavior and a documented `help` target. - Replaced fragile file operations with robust error handling, clear diagnostics, and automatic preservation of the build directory on copy failures to aid recovery. ### Rope-Based Parsing Enhancements (from 4.4.20) - Introduced direct rope-based parsers for FASTA, EMBL, and FASTQ formats, improving memory efficiency for large files. - Added U→T conversion support during sequence extraction and more reliable line ending detection. - Unified rope scanning logic under a new `ropeScanner` for better maintainability. - Added `TakeQualities()` method to BioSequence for more efficient handling of quality data. ### Bug Fixes (from 4.4.20) - Fixed `CompressStream` to correctly respect the `compressed` variable. - Replaced ambiguous string splitting utilities with precise left/right split variants (`LeftSplitInTwo`, `RightSplitInTwo`). ### Release Tooling (from 4.4.20) - Streamlined release process with modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`) and AI-assisted note generation via `aichat`. - Improved versioning support via the `VERSION` environment variable in `bump-version`. - Switched PR submission from raw `jj git push` to `stakk` for consistency and reliability. Note: This release incorporates key enhancements from 4.4.20 that impact end users, while focusing on build robustness and performance gains.	2026-03-13 11:59:32 +01:00
Eric Coissac	82a9972be7	Add parallel compilation support and improve Makefile/install script robustness - Add .DEFAULT_GOAL := all to Makefile for consistent default target - Document -j/--jobs option in README.md to allow parallel compilation - Add JOBS variable and -j/--jobs argument to install script (default: 1) - Replace fragile mkdir/cp commands with robust error handling and clear diagnostics - Add build directory preservation on copy failure for manual recovery - Pass -j option to make during compilation to enable parallel builds Release_4.4.21	2026-03-13 11:59:20 +01:00
coissac	ff6e515b2a	Merge pull request #91 from metabarcoding/push-uotrstkymowq 4.4.20: Rope-based parsing, improved release tooling, and bug fixes	2026-03-12 20:15:33 +01:00
Eric Coissac	cd0c525f50	4.4.20: Rope-based parsing, improved release tooling, and bug fixes ### Enhancements - Rope-based parsing: Added direct rope parsing for FASTA, EMBL, and FASTQ formats via `FastaChunkParserRope`, `EmblChunkParserRope`, and `FastqChunkParserRope`. Sequence extraction now supports U→T conversion and improved line ending detection. - Rope scanner refactoring: Unified rope scanning logic under a new `ropeScanner`, improving maintainability and consistency. - Sequence handling: Added `TakeQualities()` method to BioSequence for more efficient quality data handling. ### Bug Fixes - Compression behavior: Fixed `CompressStream` to correctly use the `compressed` variable instead of a hardcoded boolean. - String splitting: Replaced ambiguous `SplitInTwo` calls with precise `LeftSplitInTwo` or `RightSplitInTwo`, and added dedicated right-split utility. ### Tooling & Workflow Improvements - Makefile enhancements: Added colored terminal output, a `help` target for documenting all targets, and improved release workflow automation. - Release process: Refactored `jjpush` into modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`), replaced `orla` with `aichat` for AI-assisted release notes, and introduced robust JSON parsing using Python. Release notes are now generated and stored in temp files for tag creation. - Versioning: `bump-version` now supports the VERSION environment variable for manual version setting. - Submission: Switched from raw `jj git push` to `stakk` for PR submission. ### Internal Notes - Installation instructions are now included in release tags. - Fixed-size carry buffer replaced with dynamic slice for arbitrarily long line support without extra allocations.	2026-03-12 20:14:11 +01:00
Eric Coissac	abe935aa18	Add help target, colorize output, and improve release workflow - Add colored terminal output support (GREEN, YELLOW, BLUE, NC) - Introduce `help` target to document all Makefile targets - Enhance `bump-version` to accept VERSION env var for manual version setting - Refactor jjpush: split into modular targets (jjpush-notes, jjpush-push, jjpush-tag) - Replace orla with aichat for AI-powered release notes generation - Add robust JSON parsing using Python for release notes extraction - Use stakk for PR submission (replacing raw `jj git push`) - Generate and store release notes in temp files for tag creation - Add installation instructions to release tags - Update .PHONY with new targets 4.4.20: Rope-based parsing, improved release tooling, and bug fixes ### Enhancements - Rope-based parsing: Added direct rope parsing for FASTA, EMBL, and FASTQ formats via `FastaChunkParserRope`, `EmblChunkParserRope`, and `FastqChunkRope` functions, eliminating unnecessary memory allocation via Pack(). Sequence extraction now supports U→T conversion and improved line ending detection. - Rope scanner refactoring: Unified rope scanning logic under a new `ropeScanner`, improving maintainability and consistency across parsers. - Sequence handling: Added `TakeQualities()` method to BioSequence for more efficient quality data handling. ### Bug Fixes - Compression behavior: Fixed CompressStream to correctly use the `compressed` variable instead of a hardcoded boolean. - String splitting: Replaced ambiguous `SplitInTwo` calls with precise `LeftSplitInTwo` or `RightSplitInTwo`, and added dedicated right-split utility. ### Tooling & Workflow Improvements - Makefile enhancements: Added colored terminal output, a `help` target for documenting all targets, and improved release workflow automation. - Release process: Refactored `jjpush` into modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`), replaced `orla` with `aichat` for AI-assisted release notes, and introduced robust JSON parsing using Python. Release notes are now generated and stored in temp files for tag creation. - Versioning: `bump-version` now supports the VERSION environment variable for manual version setting. - Submission: Switched from raw `jj git push` to `stakk` for PR submission. ### Internal Notes - Installation instructions are now included in release tags. - Fixed-size carry buffer replaced with dynamic slice for arbitrarily long line support without extra allocations. Release_4.4.20	2026-03-12 20:14:11 +01:00
Eric Coissac	8dd32dc1bf	Fix CompressStream call to use compressed variable Replace hardcoded boolean with the `compressed` variable in CompressStream call to ensure correct compression behavior.	2026-03-12 18:48:22 +01:00
Eric Coissac	6ee8750635	Replace SplitInTwo with LeftSplitInTwo/RightSplitInTwo for precise splitting Replace SplitInTwo calls with LeftSplitInTwo or RightSplitInTwo depending on the intended split direction. In fastseq_json_header.go, extract rank from suffix without splitting; in biosequenceslice.go and taxid.go, use LeftSplitInTwo to split from the left; add RightSplitInTwo utility function for splitting from the right.	2026-03-12 18:41:28 +01:00
Eric Coissac	8c318c480e	replace fixed-size carry buffer with dynamic slice Replace the fixed [256]byte carry buffer with a dynamic []byte slice to support arbitrarily long lines without heap allocation during accumulation. Update all carry buffer handling logic to use len(s.carry) and append instead of fixed-size copy operations.	2026-03-11 20:44:45 +01:00
Eric Coissac	09fbc217d3	Add EMBL rope parsing support and improve sequence extraction Introduce EmblChunkParserRope function to parse EMBL chunks directly from a rope without using Pack(). Add extractEmblSeq helper to scan sequence sections and handle U to T conversion. Update parser logic to use rope-based parsing when available, and fix feature table handling for WGS entries.	2026-03-10 17:02:14 +01:00
Eric Coissac	3d2e205722	Refactor rope scanner and add FASTQ rope parser This commit refactors the rope scanner implementation by renaming gbRopeScanner to ropeScanner and extracting the common functionality into a new file. It also introduces a new FastqChunkParserRope function that parses FASTQ chunks directly from a rope without Pack(), enabling more efficient memory usage. The existing parsers are updated to use the new rope-based parser when available. The BioSequence type is enhanced with a TakeQualities method for more efficient quality data handling.	2026-03-10 16:47:03 +01:00
Eric Coissac	623116ab13	Add rope-based FASTA parsing and improve sequence handling Introduce FastaChunkParserRope for direct rope-based FASTA parsing, enhance sequence extraction with whitespace skipping and U->T conversion, and update parser logic to support both rope and raw data sources. - Added extractFastaSeq function to scan sequence bytes directly from rope - Implemented FastaChunkParserRope for rope-based parsing - Modified _ParseFastaFile to use rope when available - Updated sequence handling to support U->T conversion - Fixed line ending detection for FASTA parsing	2026-03-10 16:34:33 +01:00
coissac	1e4509cb63	Merge pull request #90 from metabarcoding/push-uzpqqoqvpnxw Push uzpqqoqvpnxw	2026-03-10 15:53:08 +01:00
Eric Coissac	b33d7705a8	Bump version to 4.4.19 Update version from 4.4.18 to 4.4.19 in both version.txt and pkg/obioptions/version.go	2026-03-10 15:51:36 +01:00
Eric Coissac	1342c83db6	Use NewBioSequenceOwning to avoid unnecessary sequence copying Replace NewBioSequence with NewBioSequenceOwning in genbank_read.go to take ownership of sequence slices without copying, improving performance. Update biosequence.go to add the new TakeSequence method and NewBioSequenceOwning constructor. Release_4.4.19	2026-03-10 15:51:35 +01:00
Eric Coissac	b246025907	Optimize Fasta batch formatting Optimize FormatFastaBatch to pre-allocate buffer and write sequences directly without intermediate strings, improving performance and memory usage.	2026-03-10 15:43:59 +01:00
Eric Coissac	761e0dbed3	Implémentation d'un parseur GenBank utilisant rope pour réduire l'usage de mémoire Ajout d'un parseur GenBank basé sur rope pour réduire l'usage de mémoire (RSS) et les allocations heap. - Ajout de `gbRopeScanner` pour lire les lignes sans allocation heap - Implémentation de `GenbankChunkParserRope` qui utilise rope au lieu de `Pack()` - Modification de `_ParseGenbankFile` et `ReadGenbank` pour utiliser le nouveau parseur - Réduction du RSS attendue de 57 GB à ~128 MB × workers - Conservation de l'ancien parseur pour compatibilité et tests Réduction significative des allocations (~50M) et temps sys, avec un temps user comparable ou meilleur.	2026-03-10 15:35:36 +01:00
Eric Coissac	a7ea47624b	Optimisation du parsing des grandes séquences Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.	2026-03-10 14:20:21 +01:00
Eric Coissac	61e346658e	Refactor jjpush workflow and enhance release notes generation Split the jjpush target into multiple sub-targets (jjpush-describe, jjpush-bump, jjpush-push, jjpush-tag) for better modularity and control. Enhance release notes generation by: - Using git log with full commit messages instead of GitHub API for pre-release mode - Adding robust JSON parsing with fallbacks for release notes - Including detailed installation instructions in release notes - Supporting both pre-release and published release modes Update release_notes.sh to handle pre-release mode, improve commit message fetching, and add installation section to release notes. Add .PHONY declarations for new sub-targets.	2026-03-10 11:09:19 +01:00
coissac	1ba1294b11	Merge pull request #89 from metabarcoding/push-uoqxkozlonwx Push uoqxkozlonwx	2026-02-20 11:42:40 +01:00
Eric Coissac	b2476fffcb	Bump version to 4.4.18 Update version from 4.4.17 to 4.4.18 in version.txt and corresponding Go variable _Version.	2026-02-20 11:40:43 +01:00
Eric Coissac	b05404721e	Bump version to 4.4.16 Update version from 4.4.15 to 4.4.16 in version.go and version.txt files. Release_4.4.18	2026-02-20 11:40:40 +01:00
Eric Coissac	c57e788459	Fix GenBank parsing and add release notes script This commit fixes an issue in the GenBank parser where empty parts were being included in the parsed data. It also introduces a new script `release_notes.sh` to automate the generation of GitHub-compatible release notes for OBITools4 versions, including support for LLM summarization and various output modes.	2026-02-20 11:37:51 +01:00
coissac	1cecf23978	Merge pull request #86 from metabarcoding/push-oulwykrpwxuz Push oulwykrpwxuz	2026-02-11 06:34:05 +01:00
Eric Coissac	4c824ef9b7	Bump version to 4.4.15 Update version from 4.4.14 to 4.4.15 in version.txt and pkg/obioptions/version.go	2026-02-11 06:31:11 +01:00
Eric Coissac	1ce5da9bee	Support new sequence file formats and improve error handling Add support for .gbff and .gbff.gz file extensions in sequence reader. Update the logic to return an error instead of using NilIBioSequence when no sequence files are found, improving the error handling and user feedback. Release_4.4.15	2026-02-11 06:31:10 +01:00
coissac	dc23d9de9a	Merge pull request #85 from metabarcoding/push-smturnsrozkp Push smturnsrozkp	2026-02-10 22:19:22 +01:00
Eric Coissac	aa9d7bbf72	Bump version to 4.4.14 Update version number from 4.4.13 to 4.4.14 in both version.go and version.txt files.	2026-02-10 22:17:23 +01:00
Eric Coissac	db22d20d0a	Rename obisuperkmer test script to obik-super and update command references Update test script name from obisuperkmer to obik-super and adjust all command references accordingly. - Changed TEST_NAME from 'obisuperkmer' to 'obik-super' - Changed CMD from 'obisuperkmer' to 'obik' - Updated MCMD to 'OBIk-super' - Modified command calls to use '$CMD super' instead of direct command names - Updated help test to use '$CMD super -h' - Updated all test cases to use the new command format Release_4.4.14	2026-02-10 22:17:22 +01:00
coissac	7c05bdb01c	Merge pull request #84 from metabarcoding/push-uxvowwlxkrlq Push uxvowwlxkrlq	2026-02-10 22:12:18 +01:00
Eric Coissac	b6542c4523	Bump version to 4.4.13 Update version from 4.4.12 to 4.4.13 in version.txt and pkg/obioptions/version.go	2026-02-10 22:10:38 +01:00
Eric Coissac	ac41dd8a22	Refactor k-mer matching pipeline with improved concurrency and memory management Refactor k-mer matching to use a pipeline architecture with improved concurrency and memory management: - Replace sort.Slice with slices.SortFunc and cmp.Compare for better performance - Introduce PreparedQueries struct to encapsulate query buckets with metadata - Implement MergeQueries function to merge query buckets from multiple batches - Rewrite MatchBatch to use pre-allocated results and mutexes instead of map-based accumulation - Add seek optimization in matchPartition to reduce linear scanning - Refactor match command to use a multi-stage pipeline with proper batching and merging - Add index directory option for match command - Improve parallel processing of sequence batches This refactoring improves performance by reducing memory allocations, optimizing k-mer lookup, and implementing a more efficient pipeline for large-scale k-mer matching operations. Release_4.4.13	2026-02-10 22:10:36 +01:00

1 2 3 4 5 ...

742 Commits