Update GitHub Actions workflow to use setup-go v5 and align with latest tooling practices.
Update version to 4.4.15 in version.txt and pkg/obioptions/version.go.
Add comprehensive documentation for the canonical super-kmer strategy, including:
- Analysis of index v1 limitations
- Experimental observations on super-kmer efficiency
- Detailed pipeline for building v3 index
- Explanation of minimizer-canonization
- Description of unitig construction and frequency filtering
- Storage format specifications for v3
- Aho-Corasick matching implementation
This change introduces a major improvement in index compactness and performance through the use of canonical super-kmers, unitigs, and efficient storage formats.
Refactor kmer index package to use disk-based partitioning with minimizer
- Replace roaring64 bitmaps with disk-based kmer index
- Implement partitioned kmer sets with delta-varint encoding
- Add support for frequency filtering during construction
- Introduce new builder pattern for index construction
- Add streaming operations for set operations (union, intersect, etc.)
- Add support for super-kmer encoding during construction
- Update command line tool to use new index format
- Remove dependency on roaring bitmap library
This change introduces a new architecture for kmer indexing that is more memory efficient and scalable for large datasets.
This commit addresses a bug in the super k-mer implementation where the minimizer bijection property was not properly enforced. The fix ensures that:
1. All k-mers within a super k-mer share the same minimizer
2. Identical super k-mer sequences have the same minimizer
The changes include:
- Fixing the super k-mer iteration logic to properly validate the minimizer bijection property
- Adding a comprehensive test suite (TestSuperKmerMinimizerBijection) that validates the intrinsic property of super k-mers
- Updating the .gitignore file to properly track relevant files
This resolves issues where the same sequence could be associated with different minimizers, violating the super k-mer definition.
This commit adds the implementation of the obisuperkmer command, including:
- The main command in cmd/obitools/obisuperkmer/
- The package implementation in pkg/obitools/obisuperkmer/
- Automated tests in obitests/obitools/obisuperkmer/
- Documentation for the implementation and tests
The obisuperkmer command extracts super k-mers from DNA sequences, following the standard OBITools architecture. It includes proper CLI option handling, validation of parameters, and integration with the OBITools pipeline system.
Tests cover basic functionality, parameter validation, output format, metadata preservation, and file I/O operations.
Ajout d'une documentation détaillée sur l'architecture des commandes OBITools, incluant la structure modulaire, les patterns architecturaux et les bonnes pratiques pour la création de nouvelles commandes.