Commit Graph

694 Commits

Author SHA1 Message Date
Eric Coissac
b6542c4523 Bump version to 4.4.13
Update version from 4.4.12 to 4.4.13 in version.txt and pkg/obioptions/version.go
2026-02-10 22:10:38 +01:00
Eric Coissac
ac41dd8a22 Refactor k-mer matching pipeline with improved concurrency and memory management
Refactor k-mer matching to use a pipeline architecture with improved concurrency and memory management:

- Replace sort.Slice with slices.SortFunc and cmp.Compare for better performance
- Introduce PreparedQueries struct to encapsulate query buckets with metadata
- Implement MergeQueries function to merge query buckets from multiple batches
- Rewrite MatchBatch to use pre-allocated results and mutexes instead of map-based accumulation
- Add seek optimization in matchPartition to reduce linear scanning
- Refactor match command to use a multi-stage pipeline with proper batching and merging
- Add index directory option for match command
- Improve parallel processing of sequence batches

This refactoring improves performance by reducing memory allocations, optimizing k-mer lookup, and implementing a more efficient pipeline for large-scale k-mer matching operations.
Release_4.4.13
2026-02-10 22:10:36 +01:00
Eric Coissac
bebbbbfe7d Add entropy-based filtering for k-mers
This commit introduces entropy-based filtering for k-mers to remove low-complexity sequences. It adds:

- New KmerEntropy and KmerEntropyFilter functions in pkg/obikmer/entropy.go for computing and filtering k-mer entropy
- Integration of entropy filtering in the k-mer set builder (pkg/obikmer/kmer_set_builder.go)
- A new 'filter' command in obik tool (pkg/obitools/obik/filter.go) to apply entropy filtering on existing indices
- CLI options for configuring entropy filtering during index building and filtering

The entropy filter helps improve the quality of k-mer sets by removing repetitive sequences that may interfere with downstream analyses.
2026-02-10 18:20:35 +01:00
Eric Coissac
c6e04265f1 Add sparse index support for KDI files with fast seeking
This commit introduces sparse index support for KDI files to enable fast random access during k-mer matching. It adds a new .kdx index file format and updates the KDI reader and writer to handle index creation and seeking. The changes include:

- New KdxIndex struct and related functions for loading, searching, and writing .kdx files
- Modified KdiReader to support seeking with the new index
- Updated KdiWriter to create .kdx index files during writing
- Enhanced KmerSetGroup.Contains to use the new index for faster lookups
- Added a new 'match' command to annotate sequences with k-mer match positions

The index is created automatically during KDI file creation and allows for O(log N / stride) binary search followed by at most stride linear scan steps, significantly improving performance for large datasets.
2026-02-10 13:24:24 +01:00
Eric Coissac
9babcc0fae Refactor lowmask options and shared kmer options
Refactor lowmask options to use shared kmer options and CLI getters

This commit refactors the lowmask subcommand to use shared kmer options and CLI getters instead of local variables. It also moves the kmer size and minimizer size options to a shared location and adds new CLI getters for the lowmask options.

- Move kmer size and minimizer size options to shared location
- Add CLI getters for lowmask options
- Refactor lowmask to use CLI getters
- Remove unused strings import
- Add MaskingMode type and related functions
2026-02-10 09:52:38 +01:00
Eric Coissac
e775f7e256 Add option to keep shorter fragments in lowmask
Add a new boolean option 'keep-shorter' to preserve fragments shorter than kmer-size during split/extract mode.

This change introduces a new flag _lowmaskKeepShorter that controls whether fragments
shorter than the kmer size should be kept during split/extract operations.

The implementation:
1. Adds the new boolean variable _lowmaskKeepShorter
2. Registers the command-line option "keep-shorter"
3. Updates the lowMaskWorker function signature to accept the keepShorter parameter
4. Modifies the fragment selection logic to check the keepShorter flag
5. Updates the worker creation to pass the global flag value

This allows users to control the behavior when dealing with short sequences in
split/extract modes, providing more flexibility in low-complexity masking.
2026-02-10 09:36:42 +01:00
Eric Coissac
f2937af1ad Add max frequency filtering and top-kmer saving capabilities
This commit introduces max frequency filtering to limit k-mer occurrences and adds functionality to save the N most frequent k-mers per set to CSV files. It also includes the ability to output k-mer frequency spectra as CSV and updates the CLI options accordingly.
2026-02-10 09:27:04 +01:00
Eric Coissac
56c1f4180c Refactor k-mer index management with subcommands and enhanced metadata support
This commit refactors the k-mer index management tools to use a unified subcommand structure with obik, adds support for per-set metadata and ID management, enhances the k-mer set group builder to support appending to existing groups, and improves command-line option handling with a new global options registration system.

Key changes:
- Introduce obik command with subcommands (index, ls, summary, cp, mv, rm, super, lowmask)
- Add support for per-set metadata and ID management in kmer set groups
- Implement ability to append to existing kmer index groups
- Refactor option parsing to use a global options registration system
- Add new commands for listing, copying, moving, and removing sets
- Enhance low-complexity masking with new options and output formats
- Improve kmer index summary with Jaccard distance matrix support
- Remove deprecated obikindex and obisuperkmer commands
- Update build process to use the new subcommand structure
2026-02-10 06:49:31 +01:00
Eric Coissac
f78543ee75 Refactor k-mer index building to use disk-based KmerSetGroupBuilder
Refactor k-mer index building to use the new disk-based KmerSetGroupBuilder instead of the old KmerSet and FrequencyFilter approaches. This change introduces a more efficient and scalable approach to building k-mer indices by using partitioned disk storage with streaming operations.

- Replace BuildKmerIndex and BuildFrequencyFilterIndex with KmerSetGroupBuilder
- Add support for frequency filtering via WithMinFrequency option
- Remove deprecated k-mer set persistence methods
- Update CLI to use new builder approach
- Add new disk-based k-mer operations (union, intersect, difference, quorum)
- Introduce KDI (K-mer Delta Index) file format for efficient storage
- Add K-way merge operations for combining sorted k-mer streams
- Update documentation and examples to reflect new API

This refactoring provides better memory usage, faster operations on large datasets, and more flexible k-mer set operations.
2026-02-10 06:49:31 +01:00
Eric Coissac
a016ad5b8a Refactor kmer index to disk-based partitioning with minimizer
Refactor kmer index package to use disk-based partitioning with minimizer

- Replace roaring64 bitmaps with disk-based kmer index
- Implement partitioned kmer sets with delta-varint encoding
- Add support for frequency filtering during construction
- Introduce new builder pattern for index construction
- Add streaming operations for set operations (union, intersect, etc.)
- Add support for super-kmer encoding during construction
- Update command line tool to use new index format
- Remove dependency on roaring bitmap library

This change introduces a new architecture for kmer indexing that is more memory efficient and scalable for large datasets.
2026-02-09 17:52:37 +01:00
coissac
09d437d10f Merge pull request #83 from metabarcoding/push-xssnppvunmlq
Push xssnppvunmlq
Release_4.4.12
2026-02-09 09:58:06 +01:00
Eric Coissac
d00ab6f83a Bump version from 4.4.11 to 4.4.12
Update version number in version.txt from 4.4.11 to 4.4.12
2026-02-09 09:46:12 +01:00
Eric Coissac
8037860518 Update version and improve release note generation
Update version from 4.4.11 to 4.4.12

- Bump version in version.go
- Enhance release note generation in Makefile to use JSON output from orla and fallback to raw output if JSON parsing fails
- Improve test script to verify minimum super k-mer length is >= k (default k=31)
2026-02-09 09:46:10 +01:00
coissac
43d6cbe56a Merge pull request #82 from metabarcoding/push-vkprtnlyxmkl
Push vkprtnlyxmkl
2026-02-09 09:16:20 +01:00
Eric Coissac
6dadee9371 Bump version to 4.4.12
Update version from 4.4.11 to 4.4.12 in version.txt and pkg/obioptions/version.go
2026-02-09 09:05:49 +01:00
Eric Coissac
99a8e69d10 Optimize low-complexity masking algorithm
This commit optimizes the low-complexity masking algorithm by:

1. Precomputing logarithm values and normalization tables to avoid repeated calculations
2. Replacing the MinMultiset-based sliding minimum with a more efficient deque-based implementation
3. Improving entropy calculation by using precomputed n*log(n) values
4. Simplifying the circular normalization process with precomputed tables
5. Removing unused imports and log statements

The changes significantly improve performance while maintaining the same masking behavior.
2026-02-09 09:05:46 +01:00
Eric Coissac
c0ae49ef92 Ajout d'obilowmask_ref au fichier .gitignore
Ajout du fichier obilowmask_ref dans le fichier .gitignore pour éviter qu'il ne soit suivi par Git.
2026-02-08 19:31:12 +01:00
Eric Coissac
08490420a2 Fix whitespace in test script and add merge consistency tests
This commit fixes minor whitespace issues in the test script and adds new tests to ensure merge attribute consistency between in-memory and on-disk paths.

- Removed trailing spaces in log messages
- Added tests for merge consistency between in-memory and on-disk paths
- These tests catch a bug where shared classifier in on-disk dereplication path caused incorrect merged attributes
2026-02-08 18:08:29 +01:00
Eric Coissac
1a28d5ed64 Add progress bar configuration and conditional display
This commit introduces a new configuration module `obidefault` to manage progress bar settings, allowing users to disable progress bars via a `--no-progressbar` option. It updates various packages to conditionally display progress bars based on this new configuration, improving user experience by providing control over progress bar output. The changes also include improvements to progress bar handling in several packages, ensuring they are only displayed when appropriate (e.g., when stderr is a terminal and stdout is not piped).
2026-02-08 16:14:02 +01:00
Eric Coissac
b2d16721f0 Fix classifier cloning and reset in chunk processing
This commit fixes an issue in the chunk processing logic where the wrong classifier instance was being reset and used for code generation. A local clone of the classifier is now created and used to ensure correct behavior during dereplication.
2026-02-08 15:52:25 +01:00
Eric Coissac
7c12b1ee83 Disable progress bar when output is piped
Modify CLIProgressBar function to check if stdout is a named pipe and disable the progress bar accordingly. This prevents the progress bar from being displayed when the output is redirected or piped to another command.
2026-02-08 14:48:13 +01:00
Eric Coissac
db98ddb241 Fix super k-mer minimizer bijection and add validation test
This commit addresses a bug in the super k-mer implementation where the minimizer bijection property was not properly enforced. The fix ensures that:

1. All k-mers within a super k-mer share the same minimizer
2. Identical super k-mer sequences have the same minimizer

The changes include:

- Fixing the super k-mer iteration logic to properly validate the minimizer bijection property
- Adding a comprehensive test suite (TestSuperKmerMinimizerBijection) that validates the intrinsic property of super k-mers
- Updating the .gitignore file to properly track relevant files

This resolves issues where the same sequence could be associated with different minimizers, violating the super k-mer definition.
2026-02-08 13:47:33 +01:00
Eric Coissac
7a979ba77f Add obisuperkmer command implementation and tests
This commit adds the implementation of the obisuperkmer command, including:

- The main command in cmd/obitools/obisuperkmer/
- The package implementation in pkg/obitools/obisuperkmer/
- Automated tests in obitests/obitools/obisuperkmer/
- Documentation for the implementation and tests

The obisuperkmer command extracts super k-mers from DNA sequences, following the standard OBITools architecture. It includes proper CLI option handling, validation of parameters, and integration with the OBITools pipeline system.

Tests cover basic functionality, parameter validation, output format, metadata preservation, and file I/O operations.
2026-02-07 13:54:02 +01:00
Eric Coissac
00c8be6b48 docs: add architecture documentation for OBITools commands
Ajout d'une documentation détaillée sur l'architecture des commandes OBITools, incluant la structure modulaire, les patterns architecturaux et les bonnes pratiques pour la création de nouvelles commandes.
2026-02-07 12:26:35 +01:00
Eric Coissac
4ae331db36 Refactor SuperKmer extraction to use iterator pattern
This commit refactors the SuperKmer extraction functionality to use Go's new iterator pattern. The ExtractSuperKmers function is now implemented as a wrapper around a new IterSuperKmers iterator function, which yields results one at a time instead of building a complete slice. This change provides better memory efficiency and more flexible consumption of super k-mers. The functionality remains the same, but the interface is now more idiomatic and efficient for large datasets.
2026-02-07 12:23:12 +01:00
Eric Coissac
f1e2846d2d Amélioration du processus de release avec génération automatique des notes de version
Mise à jour du Makefile pour améliorer le processus de version bump et de création de tag.

- Utilisation de variables pour stocker les versions précédente et actuelle
- Ajout de la génération automatique des notes de version à partir des commits entre les tags
- Intégration d'une logique de fallback si orla n'est pas disponible
- Amélioration de la documentation des étapes du processus de release
- Mise à jour de la commande de création du tag avec le message généré
2026-02-07 11:48:26 +01:00
coissac
cd5562fb30 Merge pull request #81 from metabarcoding/push-nrylumyxtxnr
Push nrylumyxtxnr
2026-02-06 10:10:22 +01:00
Eric Coissac
f79b018430 Bump version to 4.4.11
Update version from 4.4.10 to 4.4.11 in version.txt and pkg/obioptions/version.go
2026-02-06 10:09:56 +01:00
Eric Coissac
aa819618c2 Enhance OBITools4 installation script with version control and documentation
Update installation script to support specific version installation, list available versions, and improve documentation.

- Add support for installing specific versions with -v/--version flag
- Add -l/--list flag to list all available versions
- Improve help message with examples
- Update README.md to reflect new installation options and examples
- Add note on version compatibility between OBITools2 and OBITools4
- Remove ecoprimers directory
- Improve error handling and user feedback during installation
- Add version detection and download logic from GitHub releases
- Update installation process to use tagged releases instead of master branch
Release_4.4.11
2026-02-06 10:09:54 +01:00
coissac
da8d851d4d Merge pull request #80 from metabarcoding/push-vvonlpwlnwxy
Remove ecoprimers submodule
2026-02-06 09:53:29 +01:00
Eric Coissac
9823bcb41b Remove ecoprimers submodule 2026-02-06 09:52:54 +01:00
coissac
9c162459b0 Merge pull request #79 from metabarcoding/push-tpytwyyyostt
Remove ecoprimers submodule
2026-02-06 09:51:42 +01:00
Eric Coissac
25b494e562 Remove ecoprimers submodule 2026-02-06 09:50:45 +01:00
coissac
0b5cadd104 Merge pull request #78 from metabarcoding/push-pwvvkzxzmlux
Push pwvvkzxzmlux
2026-02-06 09:48:47 +01:00
Eric Coissac
a2106e4e82 Bump version to 4.4.10
Update version from 4.4.9 to 4.4.10 in version.txt and pkg/obioptions/version.go
2026-02-06 09:48:27 +01:00
Eric Coissac
a8a00ba0f7 Simplify artifact packaging and update release notes
This commit simplifies the artifact packaging process by creating a single tar.gz file containing all binaries for each platform, instead of individual files. It also updates the release notes to reflect the new packaging approach and corrects the documentation to use the new naming convention 'obitools4' instead of '<tool>'.
Release_4.4.10
2026-02-06 09:48:25 +01:00
coissac
1595a74ada Merge pull request #77 from metabarcoding/push-lwtnswxmorrq
Push lwtnswxmorrq
2026-02-06 09:35:05 +01:00
Eric Coissac
68d723ecba Bump version to 4.4.9
Update version from 4.4.8 to 4.4.9 in version.txt and corresponding Go file.
2026-02-06 09:34:43 +01:00
Eric Coissac
250d616129 Mise à jour des workflows de release pour les nouvelles versions d'OS
Mise à jour du workflow de release pour utiliser ubuntu-24.04-arm au lieu de ubuntu-latest pour ARM64, et macos-15-intel au lieu de macos-latest pour macOS. Suppression de la compilation croisée pour ARM64 et ajustement de l'installation des outils de build pour macOS.
Release_4.4.9
2026-02-06 09:34:41 +01:00
coissac
fbf816d219 Merge pull request #76 from metabarcoding/push-tzpmmnnxkvxx
Push tzpmmnnxkvxx
2026-02-06 09:09:05 +01:00
Eric Coissac
7f0133a196 Bump version to 4.4.8
Update version from 4.4.7 to 4.4.8 in version.txt and _Version variable.
2026-02-06 09:08:35 +01:00
Eric Coissac
f798f22434 Add cross-platform binary builds and release workflow improvements
This commit introduces a new build job that compiles binaries for multiple platforms (Linux, macOS) and architectures (amd64, arm64). It also refactors the release process to download pre-built artifacts and simplify the release directory preparation. The workflow now uses matrix strategy for building binaries and downloads all artifacts for the final release, removing the previous manual build steps for each platform.
Release_4.4.8
2026-02-06 09:08:33 +01:00
coissac
248bc9f672 Merge pull request #75 from metabarcoding/push-mxxuykppzlpw
Push mxxuykppzlpw
2026-02-05 18:11:12 +01:00
Eric Coissac
7a7db703f1 Bump version to 4.4.7
Update version from 4.4.6 to 4.4.7 in version.txt and pkg/obioptions/version.go
2026-02-05 18:10:45 +01:00
Eric Coissac
da195ac5cb Optimisation de la construction des binaires
Modification du fichier de workflow de release pour compiler uniquement les outils obitools lors de la construction des binaires pour chaque plateforme (Linux AMD64, Linux ARM64, macOS AMD64, macOS ARM64, Windows AMD64). Cela permet d'optimiser le processus de build en ne générant que les binaires nécessaires.
Release_4.4.7
2026-02-05 18:10:43 +01:00
coissac
20a0a09f5f Merge pull request #74 from metabarcoding/push-yqrwnpmoqllk
Push yqrwnpmoqllk
2026-02-05 18:03:28 +01:00
coissac
7d8c578c57 Merge branch 'master' into push-yqrwnpmoqllk 2026-02-05 18:03:18 +01:00
Eric Coissac
d7f615108f Bump version to 4.4.6
Update version from 4.4.5 to 4.4.6 in version.txt and pkg/obioptions/version.go
2026-02-05 18:02:30 +01:00
Eric Coissac
71574f240b Update version and add CI tests
Update version to 4.4.5 and add a test job in the release workflow to ensure tests pass before creating a release.
Release_4.4.6
2026-02-05 18:02:28 +01:00
coissac
c98501a898 Merge pull request #73 from metabarcoding/push-pklkwsssrkuv
Push pklkwsssrkuv
2026-02-05 17:54:39 +01:00