obitools4

mirror of https://github.com/metabarcoding/obitools4.git synced 2026-03-25 21:40:52 +00:00

Author	SHA1	Message	Date
Eric Coissac	78e7933c6f	4.4.26: Static Linux Builds, Memory-Aware Batching & Cross-Compilation Fixes This release includes critical build system improvements and enhanced batching capabilities for more predictable resource usage. ### Cross-Compilation & Static Builds - Fixed cross-compilation for Linux by introducing architecture-specific `CGO_CFLAGS` (x86_64-linux-gnu and aarch64-linux-gnu), ensuring correct header resolution during static linking. - Enabled fully static Linux binaries using musl, producing self-contained executables with no external runtime dependencies. ### Memory-Aware Batching (New Feature) - Added `--batch-mem` CLI option to control batching based on estimated memory usage (e.g., 128K, 64M, 1G), in addition to size-based limits. - Introduced configurable min/max batch sizes and memory thresholds, with conservative memory estimation per sequence to avoid over-allocation. - Implemented intelligent flushing logic that triggers when either byte or record count limits are exceeded, ensuring predictable memory behavior. - Improved garbage collection after large batch discards to reduce memory pressure during large-scale processing. ### Build & Toolchain Improvements - Updated Go toolchain to 1.26.1 and bumped key dependencies (e.g., golang.org/x/net v0.38.0). - Enhanced build error reporting: logs are now displayed before cleanup on failure. - Fixed Makefile quoting for `LDFLAGS` containing spaces. - Updated install script to properly configure GOROOT, GOPATH, and GOTOOLCHAIN, with added progress feedback for downloads. All batching behavior is backward-compatible and uses sensible defaults (128 MB memory, min: 1 record, max: 2000 records) to ensure smooth upgrades.	2026-03-13 19:30:37 +01:00
Eric Coissac	afc9ffda85	chore: bump version to 4.4.25 and fix CGO_CFLAGS for cross-compilation Update version to 4.4.25 in version.txt and pkg/obioptions/version.go. Fix CGO_CFLAGS in release.yml by replacing generic '-I/usr/include' with architecture-specific paths (x86_64-linux-gnu and aarch64-linux-gnu) to ensure correct header inclusion during cross-compilation on Linux.	2026-03-13 19:30:29 +01:00
Eric Coissac	15d1f1fd80	Version 4.4.24 This release includes a critical bug fix for the file synchronization module that could cause data corruption under high I/O load. Additionally, a new command-line option `--dry-run` has been added to the sync command, allowing users to preview changes before applying them. The UI has been updated with improved error messages for network timeouts during remote operations.	2026-03-13 19:11:58 +01:00
Eric Coissac	8df2cbe22f	Bump version to 4.4.23 and update release workflow - Update version from 4.4.22 to 4.4.23 in version.txt and pkg/obioptions/version.go - Add zlib1g-dev dependency to Linux release workflow for potential linking requirements - Improve tag creation in Makefile by resolving commit hash with `jj log` for better CI/CD integration	2026-03-13 19:11:55 +01:00
Eric Coissac	94b0887069	Memory-aware Batching and Static Linux Builds ### Memory-Aware Batching - Replaced single batch size limits with configurable min/max bounds and memory limits for more precise control over resource usage. - Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G). - Introduced `RebatchBySize()` with explicit support for both byte and count limits, flushing when either threshold is exceeded. - Implemented conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection to trigger explicit cleanup after large batch discards. - Updated internal batching logic across `batchiterator.go`, `fragment.go`, and `obirefidx.go` to consistently use default memory (128 MB) and size (min: 1, max: 2000) bounds. ### Linux Build Enhancements - Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies. ### Notes - This release consolidates and improves batching behavior introduced in 4.4.20, with no breaking changes to the public API. - All user-facing batching behavior is now governed by consistent memory and count constraints, improving predictability and stability during large dataset processing.	2026-03-13 15:16:41 +01:00
Eric Coissac	1e1f575d1c	refactor: replace single batch size with min/max bounds and memory limits Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.	2026-03-13 15:07:35 +01:00
Eric Coissac	40769bf827	Add memory-based batching support Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes: - Added _BatchMem and related getters/setters in pkg/obidefault - Implemented RebatchBySize() in pkg/obiter for memory-constrained batching - Added BioSequence.MemorySize() for conservative memory estimation - Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G) - Added obiutils.ParseMemSize/FormatMemSize for unit conversion - Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards - Updated sequence_reader.go to apply memory-based rebatching when enabled	2026-03-13 14:54:21 +01:00
Eric Coissac	cdc72c5346	4.4.21: Parallel builds, robust installation, and rope-based parsing enhancements This release introduces significant improvements to build reliability and performance, alongside key parsing enhancements for sequence data. ### Build & Installation Improvements - Added support for parallel compilation via `-j/--jobs` option in both the Makefile and install script, enabling faster builds on multi-core systems. The default remains single-threaded for safety. - Enhanced Makefile with `.DEFAULT_GOAL := all` for consistent behavior and a documented `help` target. - Replaced fragile file operations with robust error handling, clear diagnostics, and automatic preservation of the build directory on copy failures to aid recovery. ### Rope-Based Parsing Enhancements (from 4.4.20) - Introduced direct rope-based parsers for FASTA, EMBL, and FASTQ formats, improving memory efficiency for large files. - Added U→T conversion support during sequence extraction and more reliable line ending detection. - Unified rope scanning logic under a new `ropeScanner` for better maintainability. - Added `TakeQualities()` method to BioSequence for more efficient handling of quality data. ### Bug Fixes (from 4.4.20) - Fixed `CompressStream` to correctly respect the `compressed` variable. - Replaced ambiguous string splitting utilities with precise left/right split variants (`LeftSplitInTwo`, `RightSplitInTwo`). ### Release Tooling (from 4.4.20) - Streamlined release process with modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`) and AI-assisted note generation via `aichat`. - Improved versioning support via the `VERSION` environment variable in `bump-version`. - Switched PR submission from raw `jj git push` to `stakk` for consistency and reliability. Note: This release incorporates key enhancements from 4.4.20 that impact end users, while focusing on build robustness and performance gains.	2026-03-13 11:59:32 +01:00
Eric Coissac	cd0c525f50	4.4.20: Rope-based parsing, improved release tooling, and bug fixes ### Enhancements - Rope-based parsing: Added direct rope parsing for FASTA, EMBL, and FASTQ formats via `FastaChunkParserRope`, `EmblChunkParserRope`, and `FastqChunkParserRope`. Sequence extraction now supports U→T conversion and improved line ending detection. - Rope scanner refactoring: Unified rope scanning logic under a new `ropeScanner`, improving maintainability and consistency. - Sequence handling: Added `TakeQualities()` method to BioSequence for more efficient quality data handling. ### Bug Fixes - Compression behavior: Fixed `CompressStream` to correctly use the `compressed` variable instead of a hardcoded boolean. - String splitting: Replaced ambiguous `SplitInTwo` calls with precise `LeftSplitInTwo` or `RightSplitInTwo`, and added dedicated right-split utility. ### Tooling & Workflow Improvements - Makefile enhancements: Added colored terminal output, a `help` target for documenting all targets, and improved release workflow automation. - Release process: Refactored `jjpush` into modular targets (`jjpush-notes`, `jjpush-push`, `jjpush-tag`), replaced `orla` with `aichat` for AI-assisted release notes, and introduced robust JSON parsing using Python. Release notes are now generated and stored in temp files for tag creation. - Versioning: `bump-version` now supports the VERSION environment variable for manual version setting. - Submission: Switched from raw `jj git push` to `stakk` for PR submission. ### Internal Notes - Installation instructions are now included in release tags. - Fixed-size carry buffer replaced with dynamic slice for arbitrarily long line support without extra allocations.	2026-03-12 20:14:11 +01:00
Eric Coissac	b33d7705a8	Bump version to 4.4.19 Update version from 4.4.18 to 4.4.19 in both version.txt and pkg/obioptions/version.go	2026-03-10 15:51:36 +01:00
Eric Coissac	b2476fffcb	Bump version to 4.4.18 Update version from 4.4.17 to 4.4.18 in version.txt and corresponding Go variable _Version.	2026-02-20 11:40:43 +01:00
Eric Coissac	b05404721e	Bump version to 4.4.16 Update version from 4.4.15 to 4.4.16 in version.go and version.txt files.	2026-02-20 11:40:40 +01:00
Eric Coissac	4c824ef9b7	Bump version to 4.4.15 Update version from 4.4.14 to 4.4.15 in version.txt and pkg/obioptions/version.go	2026-02-11 06:31:11 +01:00
Eric Coissac	aa9d7bbf72	Bump version to 4.4.14 Update version number from 4.4.13 to 4.4.14 in both version.go and version.txt files.	2026-02-10 22:17:23 +01:00
Eric Coissac	b6542c4523	Bump version to 4.4.13 Update version from 4.4.12 to 4.4.13 in version.txt and pkg/obioptions/version.go	2026-02-10 22:10:38 +01:00
Eric Coissac	56c1f4180c	Refactor k-mer index management with subcommands and enhanced metadata support This commit refactors the k-mer index management tools to use a unified subcommand structure with obik, adds support for per-set metadata and ID management, enhances the k-mer set group builder to support appending to existing groups, and improves command-line option handling with a new global options registration system. Key changes: - Introduce obik command with subcommands (index, ls, summary, cp, mv, rm, super, lowmask) - Add support for per-set metadata and ID management in kmer set groups - Implement ability to append to existing kmer index groups - Refactor option parsing to use a global options registration system - Add new commands for listing, copying, moving, and removing sets - Enhance low-complexity masking with new options and output formats - Improve kmer index summary with Jaccard distance matrix support - Remove deprecated obikindex and obisuperkmer commands - Update build process to use the new subcommand structure	2026-02-10 06:49:31 +01:00
Eric Coissac	6dadee9371	Bump version to 4.4.12 Update version from 4.4.11 to 4.4.12 in version.txt and pkg/obioptions/version.go	2026-02-09 09:05:49 +01:00
Eric Coissac	f79b018430	Bump version to 4.4.11 Update version from 4.4.10 to 4.4.11 in version.txt and pkg/obioptions/version.go	2026-02-06 10:09:56 +01:00
Eric Coissac	a2106e4e82	Bump version to 4.4.10 Update version from 4.4.9 to 4.4.10 in version.txt and pkg/obioptions/version.go	2026-02-06 09:48:27 +01:00
Eric Coissac	68d723ecba	Bump version to 4.4.9 Update version from 4.4.8 to 4.4.9 in version.txt and corresponding Go file.	2026-02-06 09:34:43 +01:00
Eric Coissac	7f0133a196	Bump version to 4.4.8 Update version from 4.4.7 to 4.4.8 in version.txt and _Version variable.	2026-02-06 09:08:35 +01:00
Eric Coissac	7a7db703f1	Bump version to 4.4.7 Update version from 4.4.6 to 4.4.7 in version.txt and pkg/obioptions/version.go	2026-02-05 18:10:45 +01:00
Eric Coissac	d7f615108f	Bump version to 4.4.6 Update version from 4.4.5 to 4.4.6 in version.txt and pkg/obioptions/version.go	2026-02-05 18:02:30 +01:00
Eric Coissac	71574f240b	Update version and add CI tests Update version to 4.4.5 and add a test job in the release workflow to ensure tests pass before creating a release.	2026-02-05 18:02:28 +01:00
Eric Coissac	02ab683fa0	Bump version to 4.4.4 Update version from 4.4.3 to 4.4.4 in version.txt and pkg/obioptions/version.go	2026-02-05 17:42:01 +01:00
Eric Coissac	e3c41fc11b	Add Jaccard distance and similarity computations for KmerSet and KmerSetGroup Add Jaccard distance and similarity computations for KmerSet and KmerSetGroup This commit introduces Jaccard distance and similarity methods for KmerSet and KmerSetGroup. For KmerSet: - Added JaccardDistance method to compute the Jaccard distance between two KmerSets - Added JaccardSimilarity method to compute the Jaccard similarity between two KmerSets For KmerSetGroup: - Added JaccardDistanceMatrix method to compute a pairwise Jaccard distance matrix - Added JaccardSimilarityMatrix method to compute a pairwise Jaccard similarity matrix Also includes: - New DistMatrix implementation in pkg/obidist for storing and computing distance/similarity matrices - Updated version handling with bump-version target in Makefile - Added tests for all new methods	2026-02-05 17:39:23 +01:00
Eric Coissac	aa2e94dd6f	Refactor k-mer normalization functions and add quorum operations This commit refactors the k-mer normalization functions, renaming them from 'NormalizeKmer' to 'CanonicalKmer' to better reflect their purpose of returning canonical k-mers. It also introduces new quorum operations (AtLeast, AtMost, Exactly) for k-mer set groups, along with comprehensive tests and benchmarks. The version commit hash has also been updated.	2026-02-05 17:11:34 +01:00
Eric Coissac	12ca62b06a	Implémentation complète de la persistance pour FrequencyFilter Ajout de la fonctionnalité de sauvegarde et de chargement pour FrequencyFilter en utilisant le KmerSetGroup sous-jacent. - Nouvelle méthode Save() pour enregistrer le filtre dans un répertoire avec formatage des métadonnées - Nouvelle méthode LoadFrequencyFilter() pour charger un filtre depuis un répertoire - Initialisation des métadonnées lors de la création du filtre - Optimisation des méthodes Union() et Intersect() du KmerSetGroup - Mise à jour du commit hash	2026-02-05 16:26:10 +01:00
Eric Coissac	09ac15a76b	Refactor k-mer encoding functions to use 'canonical' terminology This commit refactors all k-mer encoding and normalization functions to consistently use 'canonical' instead of 'normalized' terminology. This includes renaming functions like EncodeNormalizedKmer to EncodeCanonicalKmer, IterNormalizedKmers to IterCanonicalKmers, and NormalizeKmer to CanonicalKmer. The change aligns the API with biological conventions where 'canonical' refers to the lexicographically smallest representation of a k-mer and its reverse complement. All related documentation and examples have been updated accordingly. The commit also updates the version file with a new commit hash.	2026-02-05 16:14:35 +01:00
Eric Coissac	16f72e6305	refactoring of obikmer	2026-02-05 16:05:48 +01:00
Eric Coissac	6c6c369ee2	Add k-mer encoding and decoding functions with normalized k-mer support This commit introduces new functions for encoding and decoding k-mers, including support for normalized k-mers. It also updates the frequency filter and k-mer set implementations to use the new encoding functions, providing zero-allocation encoding for better performance. The commit hash has been updated to reflect the latest changes.	2026-02-05 15:51:52 +01:00
Eric Coissac	c5dd477675	Refactor KmerSet and FrequencyFilter to use immutable K parameter and consistent Copy/Clone methods This commit refactors the KmerSet and related structures to use an immutable K parameter and introduces consistent Copy methods instead of Clone. It also adds attribute API support for KmerSet and KmerSetGroup, and updates persistence logic to handle IDs and metadata correctly.	2026-02-05 15:32:36 +01:00
Eric Coissac	afcb43b352	Ajout de la gestion des métadonnées utilisateur dans KmerSet et KmerSetGroup Cette modification ajoute la capacité de stocker et de persister des métadonnées utilisateur dans les structures KmerSet et KmerSetGroup. Les changements incluent l'ajout d'un champ Metadata dans KmerSet et KmerSetGroup, ainsi que la mise à jour des méthodes de clonage et de persistance pour gérer ces métadonnées. Cela permet de conserver des informations supplémentaires liées aux ensembles de k-mers tout en maintenant la compatibilité avec les opérations existantes.	2026-02-05 15:02:36 +01:00
Eric Coissac	b26b76cbf8	Add TOML persistence support for KmerSet and KmerSetGroup This commit adds support for saving and loading KmerSet and KmerSetGroup structures using TOML, YAML, and JSON formats for metadata. It includes: - Added github.com/pelletier/go-toml/v2 dependency - Implemented Save and Load methods for KmerSet and KmerSetGroup - Added metadata persistence with support for multiple formats (TOML, YAML, JSON) - Added helper functions for format detection and metadata handling - Updated version commit hash	2026-02-05 14:57:22 +01:00
Eric Coissac	00dcd78e84	Refactor k-mer encoding and frequency filtering with KmerSet This commit refactors the k-mer encoding logic to handle ambiguous bases more consistently and introduces a KmerSet type for better management of k-mer collections. The frequency filter now works with KmerSet instead of roaring bitmaps directly, and the API has been updated to support level-based frequency queries. Additionally, the commit updates the version and commit hash.	2026-02-05 14:41:59 +01:00
Eric Coissac	60f27c1dc8	Add error handling for ambiguous bases in k-mer encoding This commit introduces error handling for ambiguous DNA bases (N, R, Y, W, S, K, M, B, D, H, V) in k-mer encoding. It adds new functions IterNormalizedKmersWithErrors and EncodeNormalizedKmersWithErrors that track and encode the number of ambiguous bases in each k-mer using error markers in the top 2 bits. The commit also updates the version string to reflect the latest changes.	2026-02-04 21:45:08 +01:00
Eric Coissac	b49aba9c09	Implémentation du filtrage unique basé sur séquence et catégories Ajout d'une fonctionnalité pour le filtrage unique qui prend en compte à la fois la séquence et les catégories. - Modification de la fonction ISequenceChunk pour accepter un classifieur unique optionnel - Implémentation du traitement unique sur disque en utilisant un classifieur composite - Mise à jour du classifieur utilisé pour le tri sur disque - Correction de la gestion des clés de unicité en utilisant le code et la valeur du classifieur - Mise à jour du numéro de commit	2026-01-14 19:18:17 +01:00
Eric Coissac	0678181023	Refactor chunk processing and update version commit Optimize chunk processing by moving variable declarations inside the loop and update the commit hash in version.go to reflect the latest changes.	2026-01-14 18:46:04 +01:00
Eric Coissac	ac0d3f3fe4	Update obiuniq for very large dataset	2025-12-18 14:11:11 +01:00
Eric Coissac	547135c747	End of obilowmask	2025-12-03 11:49:07 +01:00
Eric Coissac	86e60aedd0	obicsv bug with stat on value map fields	2025-11-21 14:03:31 +01:00
Eric Coissac	e65b2a5efe	obimatrix bugs	2025-11-21 13:24:06 +01:00
Eric Coissac	ccc827afd3	finalise obilowmask	2025-11-18 15:33:08 +01:00
Eric Coissac	4603d7973e	implementation de obilowmask	2025-11-18 15:30:20 +01:00
Eric Coissac	2d7dc7d09d	debug taxonomy core dump	2025-11-05 19:01:15 +01:00
Eric Coissac	0844dcc607	bug obimatrix	2025-10-28 13:57:31 +01:00
Eric Coissac	7f4ebe757e	Bug obiuniq - don't clean the chunks	2025-10-28 13:50:22 +01:00
Eric Coissac	d17a9520b9	work on obiclean chimera detection	2025-10-20 17:29:47 +02:00
Eric Coissac	29bf4ce871	add a feature to obimatrix adding obicsv option to obimatrix	2025-10-20 16:34:58 +02:00
Eric Coissac	82b6bb1ab6	correct a bug in func (worker SeqWorker) ChainWorkers(next SeqWorker) SeqWorker	2025-08-11 15:09:49 +02:00

1 2 3 4 5

228 Commits