obitools4

mirror of https://github.com/metabarcoding/obitools4.git synced 2026-06-24 09:41:00 +00:00

Author	SHA1	Message	Date
Eric Coissac	a43e6258be	docs: translate comments to English This commit translates all French comments in the kmer filtering and set management code to English, improving code readability and maintainability for international collaborators.	2026-02-05 16:35:55 +01:00
Eric Coissac	12ca62b06a	Implémentation complète de la persistance pour FrequencyFilter Ajout de la fonctionnalité de sauvegarde et de chargement pour FrequencyFilter en utilisant le KmerSetGroup sous-jacent. - Nouvelle méthode Save() pour enregistrer le filtre dans un répertoire avec formatage des métadonnées - Nouvelle méthode LoadFrequencyFilter() pour charger un filtre depuis un répertoire - Initialisation des métadonnées lors de la création du filtre - Optimisation des méthodes Union() et Intersect() du KmerSetGroup - Mise à jour du commit hash	2026-02-05 16:26:10 +01:00
Eric Coissac	09ac15a76b	Refactor k-mer encoding functions to use 'canonical' terminology This commit refactors all k-mer encoding and normalization functions to consistently use 'canonical' instead of 'normalized' terminology. This includes renaming functions like EncodeNormalizedKmer to EncodeCanonicalKmer, IterNormalizedKmers to IterCanonicalKmers, and NormalizeKmer to CanonicalKmer. The change aligns the API with biological conventions where 'canonical' refers to the lexicographically smallest representation of a k-mer and its reverse complement. All related documentation and examples have been updated accordingly. The commit also updates the version file with a new commit hash.	2026-02-05 16:14:35 +01:00
Eric Coissac	16f72e6305	refactoring of obikmer	2026-02-05 16:05:48 +01:00
Eric Coissac	6c6c369ee2	Add k-mer encoding and decoding functions with normalized k-mer support This commit introduces new functions for encoding and decoding k-mers, including support for normalized k-mers. It also updates the frequency filter and k-mer set implementations to use the new encoding functions, providing zero-allocation encoding for better performance. The commit hash has been updated to reflect the latest changes.	2026-02-05 15:51:52 +01:00
Eric Coissac	c5dd477675	Refactor KmerSet and FrequencyFilter to use immutable K parameter and consistent Copy/Clone methods This commit refactors the KmerSet and related structures to use an immutable K parameter and introduces consistent Copy methods instead of Clone. It also adds attribute API support for KmerSet and KmerSetGroup, and updates persistence logic to handle IDs and metadata correctly.	2026-02-05 15:32:36 +01:00
Eric Coissac	afcb43b352	Ajout de la gestion des métadonnées utilisateur dans KmerSet et KmerSetGroup Cette modification ajoute la capacité de stocker et de persister des métadonnées utilisateur dans les structures KmerSet et KmerSetGroup. Les changements incluent l'ajout d'un champ Metadata dans KmerSet et KmerSetGroup, ainsi que la mise à jour des méthodes de clonage et de persistance pour gérer ces métadonnées. Cela permet de conserver des informations supplémentaires liées aux ensembles de k-mers tout en maintenant la compatibilité avec les opérations existantes.	2026-02-05 15:02:36 +01:00
Eric Coissac	b26b76cbf8	Add TOML persistence support for KmerSet and KmerSetGroup This commit adds support for saving and loading KmerSet and KmerSetGroup structures using TOML, YAML, and JSON formats for metadata. It includes: - Added github.com/pelletier/go-toml/v2 dependency - Implemented Save and Load methods for KmerSet and KmerSetGroup - Added metadata persistence with support for multiple formats (TOML, YAML, JSON) - Added helper functions for format detection and metadata handling - Updated version commit hash	2026-02-05 14:57:22 +01:00
Eric Coissac	aa468ec462	Refactor FrequencyFilter to use KmerSetGroup Refactor FrequencyFilter to inherit from KmerSetGroup for better code organization and maintainability. This change replaces the direct bitmap management with a group-based approach, simplifying the implementation and improving readability.	2026-02-05 14:46:57 +01:00
Eric Coissac	00dcd78e84	Refactor k-mer encoding and frequency filtering with KmerSet This commit refactors the k-mer encoding logic to handle ambiguous bases more consistently and introduces a KmerSet type for better management of k-mer collections. The frequency filter now works with KmerSet instead of roaring bitmaps directly, and the API has been updated to support level-based frequency queries. Additionally, the commit updates the version and commit hash.	2026-02-05 14:41:59 +01:00
Eric Coissac	60f27c1dc8	Add error handling for ambiguous bases in k-mer encoding This commit introduces error handling for ambiguous DNA bases (N, R, Y, W, S, K, M, B, D, H, V) in k-mer encoding. It adds new functions IterNormalizedKmersWithErrors and EncodeNormalizedKmersWithErrors that track and encode the number of ambiguous bases in each k-mer using error markers in the top 2 bits. The commit also updates the version string to reflect the latest changes.	2026-02-04 21:45:08 +01:00
Eric Coissac	28162ac36f	Ajout du filtre de fréquence avec v niveaux Roaring Bitmaps Implémentation complète du filtre de fréquence utilisant v niveaux de Roaring Bitmaps pour éliminer efficacement les erreurs de séquençage. - Ajout de la logique de filtrage par fréquence avec v niveaux - Intégration des bibliothèques RoaringBitmap et bitset - Ajout d'exemples d'utilisation et de documentation - Implémentation de l'itérateur de k-mers pour une utilisation mémoire efficace - Optimisation pour les distributions skewed typiques du séquençage Ce changement permet de filtrer les k-mers par fréquence minimale avec une utilisation mémoire optimale et une seule passe sur les données.	2026-02-04 21:21:10 +01:00
Eric Coissac	1a1adb83ac	Add error marker support for k-mers with enhanced documentation This commit introduces error marker functionality for k-mers with odd lengths up to 31. The top 2 bits of each k-mer are now reserved for error coding (0-3), allowing for error detection and correction capabilities. Key changes include: - Added constants KmerErrorMask and KmerSequenceMask for bit manipulation - Implemented SetKmerError, GetKmerError, and ClearKmerError functions - Updated EncodeKmers, ExtractSuperKmers, EncodeNormalizedKmers functions to enforce k ≤ 31 - Enhanced ReverseComplement to preserve error bits during reverse complement operations - Added comprehensive tests for error marker functionality including edge cases and integration tests The maximum k-mer size is now capped at 31 to accommodate the error bits, ensuring that k-mers with odd lengths ≤ 31 utilize only 62 bits of the 64-bit uint64, leaving the top 2 bits available for error coding.	2026-02-04 16:21:47 +01:00
Eric Coissac	05de9ca58e	Add SuperKmer extraction functionality This commit introduces the ExtractSuperKmers function which identifies maximal subsequences where all consecutive k-mers share the same minimizer. It includes: - SuperKmer struct to represent the maximal subsequences - dequeItem struct for tracking minimizers in a sliding window - Efficient algorithm using monotone deque for O(1) amortized minimizer tracking - Comprehensive parameter validation - Support for buffer reuse for performance optimization - Extensive test cases covering basic functionality, edge cases, and performance benchmarks The implementation uses simultaneous forward/reverse m-mer encoding for O(1) canonical m-mer computation and maintains a monotone deque to track minimizers efficiently.	2026-02-04 16:04:06 +01:00
Eric Coissac	500144051a	Add jj Makefile targets and k-mer encoding utilities Add new Makefile targets for jj operations (jjnew, jjpush, jjfetch) to streamline commit workflow. Introduce k-mer encoding utilities in pkg/obikmer: - EncodeKmers: converts DNA sequences to encoded k-mers - ReverseComplement: computes reverse complement of k-mers - NormalizeKmer: returns canonical form of k-mers - EncodeNormalizedKmers: encodes sequences with normalized k-mers Add comprehensive tests for k-mer encoding functions including edge cases, buffer reuse, and performance benchmarks. Document k-mer index design for large genomes, covering: - Use cases and objectives - Volume estimations - Distance metrics (Jaccard, Sørensen-Dice, Bray-Curtis) - Indexing options (Bloom filters, sorted sets, MPHF) - Optimization techniques (k-2-mer indexing) - MinHash for distance acceleration - Recommended architecture for presence/absence and counting queries	2026-02-04 14:27:10 +01:00
coissac	740f66b4c7	Merge pull request #71 from metabarcoding/push-onwzsyuooozn Implémentation du filtrage unique basé sur séquence et catégories	2026-01-14 19:19:27 +01:00
Eric Coissac	b49aba9c09	Implémentation du filtrage unique basé sur séquence et catégories Ajout d'une fonctionnalité pour le filtrage unique qui prend en compte à la fois la séquence et les catégories. - Modification de la fonction ISequenceChunk pour accepter un classifieur unique optionnel - Implémentation du traitement unique sur disque en utilisant un classifieur composite - Mise à jour du classifieur utilisé pour le tri sur disque - Correction de la gestion des clés de unicité en utilisant le code et la valeur du classifieur - Mise à jour du numéro de commit	2026-01-14 19:18:17 +01:00
coissac	52244cdb64	Merge pull request #70 from metabarcoding/push-kuwnszsxmxpn Refactor chunk processing and update version commit	2026-01-14 18:47:17 +01:00
Eric Coissac	0678181023	Refactor chunk processing and update version commit Optimize chunk processing by moving variable declarations inside the loop and update the commit hash in version.go to reflect the latest changes.	2026-01-14 18:46:04 +01:00
coissac	f55dd553c7	Merge pull request #68 from metabarcoding/push-rrulynolpprl Push rrulynolpprl	2026-01-14 17:44:36 +01:00
coissac	4a383ac6c9	Merge branch 'master' into push-rrulynolpprl	2025-12-18 14:12:56 +01:00
Eric Coissac	371e702423	obiannotate --cut bug	2025-12-18 14:11:11 +01:00
Eric Coissac	ac0d3f3fe4	Update obiuniq for very large dataset	2025-12-18 14:11:11 +01:00
Eric Coissac	547135c747	End of obilowmask	2025-12-03 11:49:07 +01:00
coissac	f4a919732e	Merge pull request #65 from metabarcoding/push-yurwulsmpxkq End of obilowmask	2025-11-26 12:13:08 +01:00
Eric Coissac	e681666aaa	End of obilowmask	2025-11-26 11:14:56 +01:00
coissac	adf2486295	Merge pull request #64 from metabarcoding/push-yurwulsmpxkq End of obilowmask	2025-11-24 15:36:20 +01:00
Eric Coissac	272f5c9c35	End of obilowmask	2025-11-24 15:27:38 +01:00
coissac	c1b9503ca6	Merge pull request #63 from metabarcoding/push-vypwrurrsxuk obicsv bug with stat on value map fields	2025-11-21 14:04:34 +01:00
Eric Coissac	86e60aedd0	obicsv bug with stat on value map fields	2025-11-21 14:03:31 +01:00
coissac	961abcea7b	Merge pull request #61 from metabarcoding/push-mvxssvnysyxn Push mvxssvnysyxn	2025-11-21 13:25:19 +01:00
Eric Coissac	57c65f9d50	obimatrix bug	2025-11-21 13:24:24 +01:00
Eric Coissac	e65b2a5efe	obimatrix bugs	2025-11-21 13:24:06 +01:00
coissac	3e5f3f76b0	Merge pull request #60 from metabarcoding/push-qpnzxskwpoxo Push qpnzxskwpoxo	2025-11-18 15:35:41 +01:00
Eric Coissac	ccc827afd3	finalise obilowmask	2025-11-18 15:33:08 +01:00
Eric Coissac	cef29005a5	debug url reading	2025-11-18 15:30:20 +01:00
Eric Coissac	4603d7973e	implementation de obilowmask	2025-11-18 15:30:20 +01:00
coissac	8bc47c13d3	Merge pull request #58 from metabarcoding/push-vxkqkkrokwuz debug obimultiplex	2025-11-06 15:44:31 +01:00
Eric Coissac	07cdd6f758	debug obimultiplex bug option obimultiplex	2025-11-06 15:43:13 +01:00
coissac	432da366e2	Merge pull request #57 from metabarcoding/push-ywktmvpvtvmv debug taxonomy core dump	2025-11-05 19:07:41 +01:00
Eric Coissac	2d7dc7d09d	debug taxonomy core dump	2025-11-05 19:01:15 +01:00
coissac	5e12ed5400	Merge pull request #56 from metabarcoding/push-tnrvpwvqtzyo update install script	2025-11-04 18:11:21 +01:00
Eric Coissac	7500ee1d15	update install script	2025-11-04 18:09:15 +01:00
coissac	5a1d66bf06	Merge pull request #53 from metabarcoding/push-skmxzrzulvtq Push skmxzrzulvtq	2025-10-28 14:27:19 +01:00
Eric Coissac	0844dcc607	bug obimatrix	2025-10-28 13:57:31 +01:00
Eric Coissac	7f4ebe757e	Bug obiuniq - don't clean the chunks	2025-10-28 13:50:22 +01:00
coissac	5150947e23	Merge pull request #51 from metabarcoding/push-urtwmwktsrru Push urtwmwktsrru	2025-10-20 17:41:33 +02:00
Eric Coissac	d17a9520b9	work on obiclean chimera detection	2025-10-20 17:29:47 +02:00
Eric Coissac	29bf4ce871	add a feature to obimatrix adding obicsv option to obimatrix	2025-10-20 16:34:58 +02:00
coissac	d7ed9d343e	Update install_obitools.sh for missing directory	2025-10-15 08:32:06 +02:00

1 2 3 4 5 ...

637 Commits