obitools4

mirror of https://github.com/metabarcoding/obitools4.git synced 2026-06-24 09:41:00 +00:00

Author	SHA1	Message	Date
Eric Coissac	2edb33ad08	fix: correct duplicated typos in GVal function names Corrects duplicated typos in the registered GVal function names, changing "which_maxwhichmax" and "which_minwhichmin" to "which_max" and "which_min". The underlying obiutils.WhichMax/WhichMin logic and its int-to-float64 index conversion remain unchanged.	2026-06-02 15:00:48 +02:00
Eric Coissac	e9210e28a3	Release 4.4.44	2026-06-02 14:40:09 +02:00
Eric Coissac	13a93fce11	feat: add which_max and which_min to retrieve extreme element indices Implement reflection-based WhichMax and WhichMin to dynamically find the index or key of the maximum/minimum element in slices, arrays, or maps. Functions validate orderability, handle empty collections, and dispatch via reflect.Kind. Expose as which_max and which_min GVal functions, with float64 type assertions for compatibility and preserved error handling.	2026-06-02 14:39:31 +02:00
Eric Coissac	14064c919e	fix(obiutils): correctly unwrap interface values in min/max Introduces an `unwrapInterface` reflection helper to dereference `interface{}`-wrapped values before type validation. Updates slice and map iteration loops in min/max functions to apply this helper, ensuring `isOrderedKind` accurately identifies underlying concrete types instead of incorrectly rejecting `reflect.Interface` elements.	2026-06-02 14:34:00 +02:00
Eric Coissac	930fe5f1ba	Release 4.4.43	2026-06-01 13:22:58 +02:00
Eric Coissac	dcdaf9e372	feat: support map and slice types in OBI attributes Extends OBI header parsing to recognize and deserialize JSON-like arrays and objects. Introduces safe conversion utilities in `obiutils` to cast generic interface values into typed maps, and exposes them via new `BioSequence` methods. Header values are now marshaled, quote-normalized, and formatted for map and slice types.	2026-06-01 13:21:11 +02:00
Eric Coissac	af7ae3d60c	Correct Shannon entropy bias for canonical k-mers Multiple raw k-mers collapsing into identical circular canonical forms introduce bias into complexity estimates. This change pre-computes `log(class_size)` tables and per-word-size maximum entropy bounds. The `KmerEntropy` function and `KmerEntropyFilter` are updated to apply the corrected formula `(log(N) + Σf·log(s) - Σf·log(f))/N / emax`, ensuring accurate sequence complexity estimation.	2026-05-17 14:54:57 +08:00
Eric Coissac	cecf90fa40	feat: add min/max filtering and saturating subtraction utilities Introduce generic and reflection-based utilities for filtering slices and maps by minimum/maximum thresholds, along with saturating subtraction. The `obiutils` package provides type-safe generic implementations alongside dynamic reflection dispatchers to handle arbitrary ordered and numeric types. These are exposed as GVAL expression functions in `obiseq`, extending the language's built-in filtering and numeric capabilities.	2026-05-14 20:58:24 +08:00
Eric Coissac	a186bd1c92	fix: validate non-empty sequence IDs in FASTA and FASTQ writers Adds a pre-processing guard that checks for empty sequence identifiers before formatting. This prevents malformed FASTA output and stops downstream processing of invalid FASTQ data by terminating early. The check is placed before existing sequence-length validations to enforce non-empty IDs during batch processing.	2026-05-05 18:07:58 +02:00
Eric Coissac	6c4a6c697c	[4.4.2] Enhanced taxonomy handling, input robustness & PCR tag validation - obiconvert: Added `--raw-taxid` mode to output numeric taxIDs without formatting (e.g., "12345" instead of ":tax:NCBI_0987@species"). Introduced `TaxNode.FullString()` to reliably return full formatted strings regardless of global settings, and improved fallback behavior when taxonomy DB is unavailable. - ngsfilter: Input fields (primers, sample tags/IDs) are now automatically trimmed of leading/trailing whitespace to prevent parsing failures from inconsistent formatting. - obitools (pcrtag): Mismatch-related fields (`forward_mismatches`, `reverse_mishaps`) renamed to "error" for consistency across annotation dictionaries. - obipairing & obtagpcr: Enforced mandatory paired-end file input (`--forward` and `reverse`) in obipairing; added CLI support for generating config templates via AskConfigTemplate(); removed redundant `Required()` constraints and introduced helper function CLIHasPairedFiles().	2026-04-30 16:57:45 +02:00
Eric Coissac	60b3753673	feat(obiconvert): add --raw-taxid option and refactor taxID formatting - Add new `--tax-id` mode (`obiconvert --raw-taxid`) to output bare numeric taxIDs instead of full-format strings. - Introduce `TaxNode.FullString()` to always return the complete "code:id [name]@rank" format, regardless of global `UseRawTaxids()` setting. - Update `.String(taxonomyCode)` to respect the global flag, returning bare ID when `--raw-taxid` is active. - Extract raw taxID from full-format strings in taxonomy methods when needed (e.g., fallback without loaded DB). - Add comprehensive test suite covering: a) `--raw-taxid` execution and idempotency b) full-format taxID output with `--taxonomy` c interaction of both flags d format validation - Add test data: new reference files `out_ecotag.fasta`, taxonomy.csv, and updated shell script.	2026-04-30 16:57:38 +02:00
Eric Coissac	14e2840a2d	[ngsfilter] Trim whitespace from primer and sample fields Trim leading/trailing whitespaces in forward/reverse primers, tags (via sample_tag), experiment andsample fields to prevent parsing errors due to formatting inconsistencies in input data.	2026-04-30 08:14:39 +02:00
Eric Coissac	42910c7db9	🔧 Rename mismatch fields to error in pcrtag.go - Renamed `obimultiplex_forward_mismatches` to "error" for consistency - Similarly renamed `obimultiplex_reverse_mismatches` to "error" - Applied changes in both annotation dictionaries (aanot, banot)	2026-04-29 15:29:25 +02:00
Eric Coissac	8b4cf677c6	[obitools] Add validation for paired files and config template support - Enforce requirement of both forward (-F) and reverse files in obipairing/main.go - Add config template support to obtagpcr via CLIAskConfigTemplate() - Remove redundant Required() constraints in options.go - Introduce new helper CLIHasPairedFiles()	2026-04-29 15:01:37 +02:00
Eric Coissac	449544bd63	[obiseq] Quality validation and new map_summaries aggregation - Added strict length matching between sequences and quality scores in `SetQualities`, `Take Qualites` (note: likely intended as " TakeQuantiles" or similar, but preserved per commit), and `Subsequence` operations; an error is now raised if lengths do not match. - Introduced a new `map_summaries` aggregation feature in obisummary to merge map summary data across datasets, supporting safe concurrent access and inclusion of non-empty results in the final output. - Centralized string reversal logic via a new `inverser_chaine()` utility function, replacing duplicated inline implementations throughout the codebase.	2026-04-16 14:58:23 +02:00
Eric Coissac	434d2e5930	+feat: add support for map_summaries aggregation in obisummary - Implement merging logic of `map summaries` across datasets - Ensure proper initialization and population in multi-threaded context - Add `map_summaries` to final output dictionary when non-empty	2026-04-16 14:58:18 +02:00
Eric Coissac	7cb02ded69	Refactor: Extract utility function for string reversal - Introduce `inverser_chaine()` helper to centralize logic - Replace inline reverse implementations across modules	2026-04-16 13:42:51 +02:00
Eric Coissac	6d469bd711	[obiseq] Add length validation for qualities in SetQualities, Take Qualites and Subsequence [obiseq] Add length validation for qualities in SetQualities, Take Qualites and Subsequence - Panic if sequence/qualities length mismatch when setting or taking qualities in BioSequence. - Add same check before slicing Qualities() for Subsequence to ensure consistency.	2026-04-15 18:20:53 +02:00
Eric Coissac	07d04a6967	Release 4.4.40	2026-04-14 14:48:41 +02:00
Eric Coissac	03f251c365	[release] bump version to v4.4.39 - Update `version.txt` from "v" to v4.4.39 - Auto-synced `pkg/obioptions/version.go` via Makefile	2026-04-14 14:48:38 +02:00
Eric Coissac	5714fa6cd3	chore: bump version to 4.4.39 Update package and file versions from Release/Version 4.4.38 toRelease-Version /File Version	2026-04-14 14:48:29 +02:00
Eric Coissac	4359b52eaf	Release 4.4.38	2026-04-13 17:57:00 +02:00
Eric Coissac	da0c8b6f28	♻️ refactor lua_push_interface and add json module Refactor pushInterfaceToLua to delegate unsupported types (nil, bool/int/float/string/map/slice) recursively via new lvalueFromInterface helper. Simplify typed slice and map handlers, remove explicit nil case (now handled by lvalueFromInterface), eliminate redundant type switches in pushMapStringIntToLua and similar functions. Add new luajson.go with RegisterJSON, lua.JSONEncode/Decode bindings using lvalueFromInterface and Table2 Interface for bidirectional round-trips. Include comprehensive tests covering scalars, nested structures (e.g., kmindex response), arrays and error cases.	2026-04-13 17:56:58 +02:00
Eric Coissac	e298daeef9	[v4.5] Bugfix for 3-base sequence handling and utility refactoring - Bug fix: Corrected logic in 4-mer calculation to properly handle sequences of length exactly three. Previously, such cases could produce invalid or unexpected results due to an incomplete guard condition (`length < 0`) which failed for ` length == 3` (where computed step size was zero). The fix ensures all sequences shorter than four bases are safely excluded. - Refactor: Introduced a new internal utility function (`inverser_chaine`) to centralize string reversal logic, improving code maintainability and test coverage without affecting user-facing behavior.	2026-04-13 17:18:53 +02:00
Eric Coissac	d9e6f67a6e	chore: bump version to 4.4.36 Update package and file versions from v4.4.35 to 4.4.36.	2026-04-13 17:18:48 +02:00
Eric Coissac	f036c7fa96	⬆️ version bump to v4.5 - Update `version.txt` from "v3" to v4.5 - Bump Go constant `_Version = 'Release 4.x.y'` accordingly	2026-04-13 17:18:34 +02:00
Eric Coissac	e33665e716	Refactor: Extract utility function for string reversal - Introduce `inverser_chaine()` helper to centralize logic - Update tests and documentation accordingly	2026-04-13 17:18:34 +02:00
Eric Coissac	f19065261e	We kept	2026-04-13 17:18:34 +02:00
coissac	3e349e92e1	Merge pull request #104 from theo-krueger/master Bugfix: result of 0 4mers not caught if sequence length == 3	2026-04-13 16:39:08 +02:00
Eric Coissac	960ad1531d	[4.4.34] HTTP client thread-safety and CI infrastructure updates - Improved concurrency safety by replacing the global HTTP client with a thread-safe, lazy-initialized instance using `sync.Once`. The new implementation enables connection pooling (`MaxIdleConnsPerHost`, connections per host) and dynamically configures pool size based on `obidefault.ParallelWorkers()`, ensuring robust behavior in multi-threaded Lua environments. - Updated GitHub Actions workflows to the latest stable versions of `actions/setup-go` and ` actions/checkout`, improving build reliability. - Removed outdated Go dependency checksums for buger/jsonparser v1.1.x to keep the build clean and consistent.	2026-04-13 16:27:14 +02:00
Eric Coissac	137f49d1d1	🔧 refactor(http): use thread-safe lazy-initialized HTTP client with connection pooling - Replace global _httpClient variable by a sync.Once-based lazy initialization - Add getHTTPClient() function to safely initialize client with connection pooling settings (MaxIdleConnsPerHost, Max Con ns/Conn per host) - Set connection pool size based on obidefault.ParallelWorkers() This ensures safe concurrent access and better resource management in multi-threaded Lua environments.	2026-04-13 16:27:09 +02:00
Eric Coissac	f32b29db4f	Release 4.4.33	2026-04-13 14:29:18 +02:00
Eric Coissac	10f49fe64b	📝 Clarify RegisterHTTP global registration intent // // Registers the http module in Lua state as a global, // aligning with obicontext and BioSequence conventions. The change ensures consistent module exposure across Lua environments.	2026-04-13 14:29:16 +02:00
Eric Coissac	fec078c04c	Release 4.4.32	2026-04-13 14:08:16 +02:00
Eric Coissac	a92393dd51	⬆️ update go.mod dependencies and improve error messages - Bump github.com/buger/jsonparser from v1.1.1 to v1.2 - Add error details in log.Fatalf calls for better debugging	2026-04-13 14:08:13 +02:00
Eric Coissac	64b0b32f61	Release 4.4.31	2026-04-13 13:35:39 +02:00
Eric Coissac	c8e6a218cb	[release] bump version to v4.5 - Update obioptions/version.go and version.txt from Release v4.5 to 68302a1 - Increment patch version: from `Release v4.5` → 68302a1 - Align version.txt with current release tag	2026-04-13 13:35:33 +02:00
Eric Coissac	8c7017a99d	⬆️ version bump to v4.5 - Update obioptions.Version from "Release 4.4.29" to "/v/ Release v5" - Update version.txt from 4.29 → .30 (automated by Makefile)	2026-04-13 13:34:53 +02:00
theo-krueger	c7816973a6	Bugfix: result of 0 4mers not caught if sequence length == 3 In the 4mer calculation: length := slength - 3 - for sequences with <4 bases, length is <=0 The check to stop did only catch <0, so sequences lengths 2 or less, leaving sequence lengths of 3 unguarded if length < 0 { return nil }	2026-04-10 14:05:30 +02:00
Eric Coissac	a786b58ed3	Dynamic Batch Flushing and Build Improvements This release introduces dynamic batch flushing in the Distribute component, replacing the previous fixed-size batching with a memory- and count-aware strategy. Batches now flush automatically when either the maximum sequence count (BatchSizeMax()) or memory threshold (BatchMem()) per key is reached, ensuring more efficient resource usage and consistent behavior with the RebatchBySize strategy. The optional sizes parameter has been removed, and related code—including the Lua wrapper and worker buffer handling—has been updated for correctness and simplicity. Unused BatchSize() references have been eliminated from obidistribute. Additionally, this release includes improvements to static Linux builds and overall build stability, enhancing reliability across deployment environments.	2026-03-16 22:06:51 +01:00
Eric Coissac	a2b26712b2	refactor: replace fixed batch size with dynamic flushing based on count and memory Replace the old fixed batch-size mechanism in Distribute with a dynamic strategy that flushes batches when either BatchSizeMax() sequences or BatchMem() bytes are reached per key. This aligns with the RebatchBySize strategy and removes the optional sizes parameter. Also update related code: simplify Lua wrapper to accept optional capacity, and fix buffer growth logic in worker.go using slices.Grow correctly. Remove unused BatchSize() usage from obidistribute.	2026-03-16 22:06:44 +01:00
Eric Coissac	af213ab446	4.4.28: Static Linux Builds, Memory-Aware Batching, and Build Stability This release focuses on improving build reliability, memory efficiency for large datasets, and portability of Linux binaries. ### Static Linux Binaries - Linux binaries are now built with static linking using musl, eliminating external runtime dependencies and ensuring portability across distributions. ### Memory-Aware Batching - Users can now control memory usage during processing with the new `--batch-mem` option, specifying limits such as 128K, 64M, or 1G. - Batching logic now respects both size and memory constraints: batches are flushed when either threshold is exceeded. - Conservative memory estimation for sequences helps avoid over-allocation, and explicit garbage collection after large batch discards reduces memory spikes. ### Build System Improvements - Upgraded to Go 1.26 for improved performance and toolchain stability. - Fixed cross-compilation issues by replacing generic include paths with architecture-specific ones (x86_64-linux-gnu and aarch64-linux-gnu). - Streamlined macOS builds by removing special flags, using standard `make` targets. - Enhanced error reporting during build failures: logs are now shown before cleanup and exit. - Updated install script to correctly configure GOROOT, GOPATH, and GOTOOLCHAIN, with visual progress feedback for downloads. All batching behavior is non-breaking and maintains backward compatibility while offering more predictable resource usage on large datasets.	2026-03-14 11:59:15 +01:00
Eric Coissac	a60184c115	chore: bump version to 4.4.27 and add zlib-static dependency Update version to 4.4.27 in version.txt and pkg/obioptions/version.go. Add zlib-static package to release workflow to ensure static linking of zlib, resolving potential runtime dependency issues with the external link mode.	2026-03-14 11:59:04 +01:00
Eric Coissac	585b024bf0	chore: update to Go 1.26 and refactor release workflow - Upgrade Go version from 1.23 to 1.26 in release.yml - Remove CGO_CFLAGS from cross-compilation matrix entries - Replace Linux build tools installation with Docker-based static build using golang:1.26-alpine - Simplify macOS build to use standard make without special flags - Increment version to 4.4.26	2026-03-14 11:43:31 +01:00
Eric Coissac	afc9ffda85	chore: bump version to 4.4.25 and fix CGO_CFLAGS for cross-compilation Update version to 4.4.25 in version.txt and pkg/obioptions/version.go. Fix CGO_CFLAGS in release.yml by replacing generic '-I/usr/include' with architecture-specific paths (x86_64-linux-gnu and aarch64-linux-gnu) to ensure correct header inclusion during cross-compilation on Linux.	2026-03-13 19:30:29 +01:00
Eric Coissac	15d1f1fd80	Version 4.4.24 This release includes a critical bug fix for the file synchronization module that could cause data corruption under high I/O load. Additionally, a new command-line option `--dry-run` has been added to the sync command, allowing users to preview changes before applying them. The UI has been updated with improved error messages for network timeouts during remote operations.	2026-03-13 19:11:58 +01:00
Eric Coissac	8df2cbe22f	Bump version to 4.4.23 and update release workflow - Update version from 4.4.22 to 4.4.23 in version.txt and pkg/obioptions/version.go - Add zlib1g-dev dependency to Linux release workflow for potential linking requirements - Improve tag creation in Makefile by resolving commit hash with `jj log` for better CI/CD integration	2026-03-13 19:11:55 +01:00
Eric Coissac	94b0887069	Memory-aware Batching and Static Linux Builds ### Memory-Aware Batching - Replaced single batch size limits with configurable min/max bounds and memory limits for more precise control over resource usage. - Added `--batch-mem` CLI option to enable adaptive batching based on estimated sequence memory footprint (e.g., 128K, 64M, 1G). - Introduced `RebatchBySize()` with explicit support for both byte and count limits, flushing when either threshold is exceeded. - Implemented conservative memory estimation via `BioSequence.MemorySize()` and enhanced garbage collection to trigger explicit cleanup after large batch discards. - Updated internal batching logic across `batchiterator.go`, `fragment.go`, and `obirefidx.go` to consistently use default memory (128 MB) and size (min: 1, max: 2000) bounds. ### Linux Build Enhancements - Enabled static linking for Linux binaries using musl, producing portable, self-contained executables without external dependencies. ### Notes - This release consolidates and improves batching behavior introduced in 4.4.20, with no breaking changes to the public API. - All user-facing batching behavior is now governed by consistent memory and count constraints, improving predictability and stability during large dataset processing.	2026-03-13 15:16:41 +01:00
Eric Coissac	c188580aac	Replace Rebatch with RebatchBySize using default batch parameters Replace calls to Rebatch(size) with RebatchBySize(obidefault.BatchMem(), obidefault.BatchSizeMax()) in batchiterator.go, fragment.go, and obirefidx.go to ensure consistent use of default memory and size limits for batch rebatching.	2026-03-13 15:16:33 +01:00
Eric Coissac	1e1f575d1c	refactor: replace single batch size with min/max bounds and memory limits Introduce separate _BatchSize (min) and _BatchSizeMax (max) constants to replace the single _BatchSize variable. Update RebatchBySize to accept both maxBytes and maxCount parameters, flushing when either limit is exceeded. Set default batch size min to 1, max to 2000, and memory limit to 128 MB. Update CLI options and sequence_reader.go accordingly.	2026-03-13 15:07:35 +01:00

1 2 3 4 5 ...

586 Commits