Commit Graph

188 Commits

Author SHA1 Message Date
Eric Coissac
7f0133a196 Bump version to 4.4.8
Update version from 4.4.7 to 4.4.8 in version.txt and _Version variable.
2026-02-06 09:08:35 +01:00
Eric Coissac
7a7db703f1 Bump version to 4.4.7
Update version from 4.4.6 to 4.4.7 in version.txt and pkg/obioptions/version.go
2026-02-05 18:10:45 +01:00
Eric Coissac
d7f615108f Bump version to 4.4.6
Update version from 4.4.5 to 4.4.6 in version.txt and pkg/obioptions/version.go
2026-02-05 18:02:30 +01:00
Eric Coissac
71574f240b Update version and add CI tests
Update version to 4.4.5 and add a test job in the release workflow to ensure tests pass before creating a release.
2026-02-05 18:02:28 +01:00
Eric Coissac
02ab683fa0 Bump version to 4.4.4
Update version from 4.4.3 to 4.4.4 in version.txt and pkg/obioptions/version.go
2026-02-05 17:42:01 +01:00
Eric Coissac
e3c41fc11b Add Jaccard distance and similarity computations for KmerSet and KmerSetGroup
Add Jaccard distance and similarity computations for KmerSet and KmerSetGroup

This commit introduces Jaccard distance and similarity methods for KmerSet and KmerSetGroup.

For KmerSet:
- Added JaccardDistance method to compute the Jaccard distance between two KmerSets
- Added JaccardSimilarity method to compute the Jaccard similarity between two KmerSets

For KmerSetGroup:
- Added JaccardDistanceMatrix method to compute a pairwise Jaccard distance matrix
- Added JaccardSimilarityMatrix method to compute a pairwise Jaccard similarity matrix

Also includes:
- New DistMatrix implementation in pkg/obidist for storing and computing distance/similarity matrices
- Updated version handling with bump-version target in Makefile
- Added tests for all new methods
2026-02-05 17:39:23 +01:00
Eric Coissac
aa2e94dd6f Refactor k-mer normalization functions and add quorum operations
This commit refactors the k-mer normalization functions, renaming them from 'NormalizeKmer' to 'CanonicalKmer' to better reflect their purpose of returning canonical k-mers. It also introduces new quorum operations (AtLeast, AtMost, Exactly) for k-mer set groups, along with comprehensive tests and benchmarks. The version commit hash has also been updated.
2026-02-05 17:11:34 +01:00
Eric Coissac
12ca62b06a Implémentation complète de la persistance pour FrequencyFilter
Ajout de la fonctionnalité de sauvegarde et de chargement pour FrequencyFilter en utilisant le KmerSetGroup sous-jacent.

- Nouvelle méthode Save() pour enregistrer le filtre dans un répertoire avec formatage des métadonnées
- Nouvelle méthode LoadFrequencyFilter() pour charger un filtre depuis un répertoire
- Initialisation des métadonnées lors de la création du filtre
- Optimisation des méthodes Union() et Intersect() du KmerSetGroup
- Mise à jour du commit hash
2026-02-05 16:26:10 +01:00
Eric Coissac
09ac15a76b Refactor k-mer encoding functions to use 'canonical' terminology
This commit refactors all k-mer encoding and normalization functions to consistently use 'canonical' instead of 'normalized' terminology. This includes renaming functions like EncodeNormalizedKmer to EncodeCanonicalKmer, IterNormalizedKmers to IterCanonicalKmers, and NormalizeKmer to CanonicalKmer. The change aligns the API with biological conventions where 'canonical' refers to the lexicographically smallest representation of a k-mer and its reverse complement. All related documentation and examples have been updated accordingly. The commit also updates the version file with a new commit hash.
2026-02-05 16:14:35 +01:00
Eric Coissac
16f72e6305 refactoring of obikmer 2026-02-05 16:05:48 +01:00
Eric Coissac
6c6c369ee2 Add k-mer encoding and decoding functions with normalized k-mer support
This commit introduces new functions for encoding and decoding k-mers, including support for normalized k-mers. It also updates the frequency filter and k-mer set implementations to use the new encoding functions, providing zero-allocation encoding for better performance. The commit hash has been updated to reflect the latest changes.
2026-02-05 15:51:52 +01:00
Eric Coissac
c5dd477675 Refactor KmerSet and FrequencyFilter to use immutable K parameter and consistent Copy/Clone methods
This commit refactors the KmerSet and related structures to use an immutable K parameter and introduces consistent Copy methods instead of Clone. It also adds attribute API support for KmerSet and KmerSetGroup, and updates persistence logic to handle IDs and metadata correctly.
2026-02-05 15:32:36 +01:00
Eric Coissac
afcb43b352 Ajout de la gestion des métadonnées utilisateur dans KmerSet et KmerSetGroup
Cette modification ajoute la capacité de stocker et de persister des métadonnées utilisateur dans les structures KmerSet et KmerSetGroup. Les changements incluent l'ajout d'un champ Metadata dans KmerSet et KmerSetGroup, ainsi que la mise à jour des méthodes de clonage et de persistance pour gérer ces métadonnées. Cela permet de conserver des informations supplémentaires liées aux ensembles de k-mers tout en maintenant la compatibilité avec les opérations existantes.
2026-02-05 15:02:36 +01:00
Eric Coissac
b26b76cbf8 Add TOML persistence support for KmerSet and KmerSetGroup
This commit adds support for saving and loading KmerSet and KmerSetGroup structures using TOML, YAML, and JSON formats for metadata. It includes:

- Added github.com/pelletier/go-toml/v2 dependency
- Implemented Save and Load methods for KmerSet and KmerSetGroup
- Added metadata persistence with support for multiple formats (TOML, YAML, JSON)
- Added helper functions for format detection and metadata handling
- Updated version commit hash
2026-02-05 14:57:22 +01:00
Eric Coissac
00dcd78e84 Refactor k-mer encoding and frequency filtering with KmerSet
This commit refactors the k-mer encoding logic to handle ambiguous bases more consistently and introduces a KmerSet type for better management of k-mer collections. The frequency filter now works with KmerSet instead of roaring bitmaps directly, and the API has been updated to support level-based frequency queries. Additionally, the commit updates the version and commit hash.
2026-02-05 14:41:59 +01:00
Eric Coissac
60f27c1dc8 Add error handling for ambiguous bases in k-mer encoding
This commit introduces error handling for ambiguous DNA bases (N, R, Y, W, S, K, M, B, D, H, V) in k-mer encoding. It adds new functions IterNormalizedKmersWithErrors and EncodeNormalizedKmersWithErrors that track and encode the number of ambiguous bases in each k-mer using error markers in the top 2 bits. The commit also updates the version string to reflect the latest changes.
2026-02-04 21:45:08 +01:00
Eric Coissac
b49aba9c09 Implémentation du filtrage unique basé sur séquence et catégories
Ajout d'une fonctionnalité pour le filtrage unique qui prend en compte à la fois la séquence et les catégories.

- Modification de la fonction ISequenceChunk pour accepter un classifieur unique optionnel
- Implémentation du traitement unique sur disque en utilisant un classifieur composite
- Mise à jour du classifieur utilisé pour le tri sur disque
- Correction de la gestion des clés de unicité en utilisant le code et la valeur du classifieur
- Mise à jour du numéro de commit
2026-01-14 19:18:17 +01:00
Eric Coissac
0678181023 Refactor chunk processing and update version commit
Optimize chunk processing by moving variable declarations inside the loop and update the commit hash in version.go to reflect the latest changes.
2026-01-14 18:46:04 +01:00
Eric Coissac
ac0d3f3fe4 Update obiuniq for very large dataset 2025-12-18 14:11:11 +01:00
Eric Coissac
547135c747 End of obilowmask 2025-12-03 11:49:07 +01:00
Eric Coissac
86e60aedd0 obicsv bug with stat on value map fields 2025-11-21 14:03:31 +01:00
Eric Coissac
e65b2a5efe obimatrix bugs 2025-11-21 13:24:06 +01:00
Eric Coissac
ccc827afd3 finalise obilowmask 2025-11-18 15:33:08 +01:00
Eric Coissac
4603d7973e implementation de obilowmask 2025-11-18 15:30:20 +01:00
Eric Coissac
2d7dc7d09d debug taxonomy core dump 2025-11-05 19:01:15 +01:00
Eric Coissac
0844dcc607 bug obimatrix 2025-10-28 13:57:31 +01:00
Eric Coissac
7f4ebe757e Bug obiuniq - don't clean the chunks 2025-10-28 13:50:22 +01:00
Eric Coissac
d17a9520b9 work on obiclean chimera detection 2025-10-20 17:29:47 +02:00
Eric Coissac
29bf4ce871 add a feature to obimatrix adding obicsv option to obimatrix 2025-10-20 16:34:58 +02:00
Eric Coissac
82b6bb1ab6 correct a bug in func (worker SeqWorker) ChainWorkers(next SeqWorker) SeqWorker 2025-08-11 15:09:49 +02:00
Eric Coissac
6d204f6281 Patch the fastq detector 2025-08-08 10:23:03 -04:00
Eric Coissac
7a6d552450 Changes to be committed:
modified:   pkg/obioptions/version.go
2025-08-07 17:01:48 -04:00
Eric Coissac
730d448fc3 Allows for only one cpu and it should work 2025-08-06 16:09:25 -04:00
Eric Coissac
04f3af3e60 some renaming of functions 2025-08-06 15:54:50 -04:00
Eric Coissac
ed28d3fb5b Adds a --u-to-t option 2025-07-07 15:35:26 +02:00
Eric Coissac
43b285587e Debug on taxonomy extraction and CSV conversion 2025-07-07 15:29:40 +02:00
Eric Coissac
235a7e202a Update obisummary to account new obiseq.StatsOnValues type 2025-06-19 17:21:30 +02:00
Eric Coissac
27fa984a63 Patch obimatrix accoring to the new type obiseq.StatsOnValues 2025-06-19 16:51:53 +02:00
Eric Coissac
add9d89ccc Patch the Min and Max values of the expression language 2025-06-19 16:43:26 +02:00
Eric Coissac
9965370d85 Manage a lock on StatsOnValues 2025-06-17 16:46:11 +02:00
Eric Coissac
8a2bb1fe82 Changes to be committed:
modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/merge.go
2025-06-17 12:11:35 +02:00
Eric Coissac
efc3f3af29 Patch a concurrent access problem 2025-06-17 12:05:42 +02:00
Eric Coissac
1c6ab1c559 Changes to be committed:
modified:   pkg/obingslibrary/multimatch.go
	modified:   pkg/obioptions/version.go
2025-06-17 09:06:42 +02:00
Eric Coissac
38dcd98d4a Patch the genbank parser automata 2025-06-17 08:52:45 +02:00
Eric Coissac
7b23985693 Add _ to allowed in taxid 2025-06-06 14:37:57 +02:00
Eric Coissac
d31e677304 Patch a bug in obitag 2025-06-04 14:47:28 +02:00
Eric Coissac
6cb7a5a352 Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitaxonomy/main.go
	modified:   pkg/obiformats/csvtaxdump_read.go
	modified:   pkg/obiformats/ecopcr_read.go
	modified:   pkg/obiformats/ncbitaxdump_read.go
	modified:   pkg/obiformats/ncbitaxdump_readtar.go
	modified:   pkg/obiformats/newick_write.go
	modified:   pkg/obiformats/options.go
	modified:   pkg/obiformats/taxonomy_read.go
	modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obiiter/extract_taxonomy.go
	modified:   pkg/obioptions/options.go
	modified:   pkg/obioptions/version.go
	new file:   pkg/obiphylo/tree.go
	modified:   pkg/obiseq/biosequenceslice.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitools/obiconvert/sequence_reader.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitaxonomy/obitaxonomy.go
	modified:   pkg/obitools/obitaxonomy/options.go
	deleted:    sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac
3424d3057f Changes to be committed:
modified:   pkg/obiformats/ngsfilter_read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiutils/mimetypes.go
2025-05-14 14:53:25 +02:00
Eric Coissac
f9324dd8f4 add min and max to the obitools expression language 2025-05-13 16:03:03 +02:00
Eric Coissac
f1b9ac4a13 Update the expression language 2025-05-07 20:45:05 +02:00