Add a new boolean option 'keep-shorter' to preserve fragments shorter than kmer-size during split/extract mode.
This change introduces a new flag _lowmaskKeepShorter that controls whether fragments
shorter than the kmer size should be kept during split/extract operations.
The implementation:
1. Adds the new boolean variable _lowmaskKeepShorter
2. Registers the command-line option "keep-shorter"
3. Updates the lowMaskWorker function signature to accept the keepShorter parameter
4. Modifies the fragment selection logic to check the keepShorter flag
5. Updates the worker creation to pass the global flag value
This allows users to control the behavior when dealing with short sequences in
split/extract modes, providing more flexibility in low-complexity masking.
This commit introduces max frequency filtering to limit k-mer occurrences and adds functionality to save the N most frequent k-mers per set to CSV files. It also includes the ability to output k-mer frequency spectra as CSV and updates the CLI options accordingly.
This commit refactors the k-mer index management tools to use a unified subcommand structure with obik, adds support for per-set metadata and ID management, enhances the k-mer set group builder to support appending to existing groups, and improves command-line option handling with a new global options registration system.
Key changes:
- Introduce obik command with subcommands (index, ls, summary, cp, mv, rm, super, lowmask)
- Add support for per-set metadata and ID management in kmer set groups
- Implement ability to append to existing kmer index groups
- Refactor option parsing to use a global options registration system
- Add new commands for listing, copying, moving, and removing sets
- Enhance low-complexity masking with new options and output formats
- Improve kmer index summary with Jaccard distance matrix support
- Remove deprecated obikindex and obisuperkmer commands
- Update build process to use the new subcommand structure
Refactor k-mer index building to use the new disk-based KmerSetGroupBuilder instead of the old KmerSet and FrequencyFilter approaches. This change introduces a more efficient and scalable approach to building k-mer indices by using partitioned disk storage with streaming operations.
- Replace BuildKmerIndex and BuildFrequencyFilterIndex with KmerSetGroupBuilder
- Add support for frequency filtering via WithMinFrequency option
- Remove deprecated k-mer set persistence methods
- Update CLI to use new builder approach
- Add new disk-based k-mer operations (union, intersect, difference, quorum)
- Introduce KDI (K-mer Delta Index) file format for efficient storage
- Add K-way merge operations for combining sorted k-mer streams
- Update documentation and examples to reflect new API
This refactoring provides better memory usage, faster operations on large datasets, and more flexible k-mer set operations.
Refactor kmer index package to use disk-based partitioning with minimizer
- Replace roaring64 bitmaps with disk-based kmer index
- Implement partitioned kmer sets with delta-varint encoding
- Add support for frequency filtering during construction
- Introduce new builder pattern for index construction
- Add streaming operations for set operations (union, intersect, etc.)
- Add support for super-kmer encoding during construction
- Update command line tool to use new index format
- Remove dependency on roaring bitmap library
This change introduces a new architecture for kmer indexing that is more memory efficient and scalable for large datasets.
Update version from 4.4.11 to 4.4.12
- Bump version in version.go
- Enhance release note generation in Makefile to use JSON output from orla and fallback to raw output if JSON parsing fails
- Improve test script to verify minimum super k-mer length is >= k (default k=31)
This commit optimizes the low-complexity masking algorithm by:
1. Precomputing logarithm values and normalization tables to avoid repeated calculations
2. Replacing the MinMultiset-based sliding minimum with a more efficient deque-based implementation
3. Improving entropy calculation by using precomputed n*log(n) values
4. Simplifying the circular normalization process with precomputed tables
5. Removing unused imports and log statements
The changes significantly improve performance while maintaining the same masking behavior.
This commit fixes minor whitespace issues in the test script and adds new tests to ensure merge attribute consistency between in-memory and on-disk paths.
- Removed trailing spaces in log messages
- Added tests for merge consistency between in-memory and on-disk paths
- These tests catch a bug where shared classifier in on-disk dereplication path caused incorrect merged attributes
This commit introduces a new configuration module `obidefault` to manage progress bar settings, allowing users to disable progress bars via a `--no-progressbar` option. It updates various packages to conditionally display progress bars based on this new configuration, improving user experience by providing control over progress bar output. The changes also include improvements to progress bar handling in several packages, ensuring they are only displayed when appropriate (e.g., when stderr is a terminal and stdout is not piped).
This commit fixes an issue in the chunk processing logic where the wrong classifier instance was being reset and used for code generation. A local clone of the classifier is now created and used to ensure correct behavior during dereplication.
Modify CLIProgressBar function to check if stdout is a named pipe and disable the progress bar accordingly. This prevents the progress bar from being displayed when the output is redirected or piped to another command.
This commit addresses a bug in the super k-mer implementation where the minimizer bijection property was not properly enforced. The fix ensures that:
1. All k-mers within a super k-mer share the same minimizer
2. Identical super k-mer sequences have the same minimizer
The changes include:
- Fixing the super k-mer iteration logic to properly validate the minimizer bijection property
- Adding a comprehensive test suite (TestSuperKmerMinimizerBijection) that validates the intrinsic property of super k-mers
- Updating the .gitignore file to properly track relevant files
This resolves issues where the same sequence could be associated with different minimizers, violating the super k-mer definition.
This commit adds the implementation of the obisuperkmer command, including:
- The main command in cmd/obitools/obisuperkmer/
- The package implementation in pkg/obitools/obisuperkmer/
- Automated tests in obitests/obitools/obisuperkmer/
- Documentation for the implementation and tests
The obisuperkmer command extracts super k-mers from DNA sequences, following the standard OBITools architecture. It includes proper CLI option handling, validation of parameters, and integration with the OBITools pipeline system.
Tests cover basic functionality, parameter validation, output format, metadata preservation, and file I/O operations.
Ajout d'une documentation détaillée sur l'architecture des commandes OBITools, incluant la structure modulaire, les patterns architecturaux et les bonnes pratiques pour la création de nouvelles commandes.
This commit refactors the SuperKmer extraction functionality to use Go's new iterator pattern. The ExtractSuperKmers function is now implemented as a wrapper around a new IterSuperKmers iterator function, which yields results one at a time instead of building a complete slice. This change provides better memory efficiency and more flexible consumption of super k-mers. The functionality remains the same, but the interface is now more idiomatic and efficient for large datasets.
Mise à jour du Makefile pour améliorer le processus de version bump et de création de tag.
- Utilisation de variables pour stocker les versions précédente et actuelle
- Ajout de la génération automatique des notes de version à partir des commits entre les tags
- Intégration d'une logique de fallback si orla n'est pas disponible
- Amélioration de la documentation des étapes du processus de release
- Mise à jour de la commande de création du tag avec le message généré
Update installation script to support specific version installation, list available versions, and improve documentation.
- Add support for installing specific versions with -v/--version flag
- Add -l/--list flag to list all available versions
- Improve help message with examples
- Update README.md to reflect new installation options and examples
- Add note on version compatibility between OBITools2 and OBITools4
- Remove ecoprimers directory
- Improve error handling and user feedback during installation
- Add version detection and download logic from GitHub releases
- Update installation process to use tagged releases instead of master branch
This commit simplifies the artifact packaging process by creating a single tar.gz file containing all binaries for each platform, instead of individual files. It also updates the release notes to reflect the new packaging approach and corrects the documentation to use the new naming convention 'obitools4' instead of '<tool>'.
Mise à jour du workflow de release pour utiliser ubuntu-24.04-arm au lieu de ubuntu-latest pour ARM64, et macos-15-intel au lieu de macos-latest pour macOS. Suppression de la compilation croisée pour ARM64 et ajustement de l'installation des outils de build pour macOS.
This commit introduces a new build job that compiles binaries for multiple platforms (Linux, macOS) and architectures (amd64, arm64). It also refactors the release process to download pre-built artifacts and simplify the release directory preparation. The workflow now uses matrix strategy for building binaries and downloads all artifacts for the final release, removing the previous manual build steps for each platform.
Modification du fichier de workflow de release pour compiler uniquement les outils obitools lors de la construction des binaires pour chaque plateforme (Linux AMD64, Linux ARM64, macOS AMD64, macOS ARM64, Windows AMD64). Cela permet d'optimiser le processus de build en ne générant que les binaires nécessaires.
This commit introduces a new GitHub Actions workflow to automatically create releases when tags matching the pattern 'Release_*' are pushed. It also updates the Makefile to use the new tag format 'Release_<version>' for tagging commits, ensuring consistency with the new release automation.