Commit Graph

244 Commits

Author SHA1 Message Date
Eric Coissac 00dcd78e84 Refactor k-mer encoding and frequency filtering with KmerSet
This commit refactors the k-mer encoding logic to handle ambiguous bases more consistently and introduces a KmerSet type for better management of k-mer collections. The frequency filter now works with KmerSet instead of roaring bitmaps directly, and the API has been updated to support level-based frequency queries. Additionally, the commit updates the version and commit hash.
2026-02-05 14:41:59 +01:00
Eric Coissac 60f27c1dc8 Add error handling for ambiguous bases in k-mer encoding
This commit introduces error handling for ambiguous DNA bases (N, R, Y, W, S, K, M, B, D, H, V) in k-mer encoding. It adds new functions IterNormalizedKmersWithErrors and EncodeNormalizedKmersWithErrors that track and encode the number of ambiguous bases in each k-mer using error markers in the top 2 bits. The commit also updates the version string to reflect the latest changes.
2026-02-04 21:45:08 +01:00
Eric Coissac b49aba9c09 Implémentation du filtrage unique basé sur séquence et catégories
Ajout d'une fonctionnalité pour le filtrage unique qui prend en compte à la fois la séquence et les catégories.

- Modification de la fonction ISequenceChunk pour accepter un classifieur unique optionnel
- Implémentation du traitement unique sur disque en utilisant un classifieur composite
- Mise à jour du classifieur utilisé pour le tri sur disque
- Correction de la gestion des clés de unicité en utilisant le code et la valeur du classifieur
- Mise à jour du numéro de commit
2026-01-14 19:18:17 +01:00
Eric Coissac 0678181023 Refactor chunk processing and update version commit
Optimize chunk processing by moving variable declarations inside the loop and update the commit hash in version.go to reflect the latest changes.
2026-01-14 18:46:04 +01:00
Eric Coissac ac0d3f3fe4 Update obiuniq for very large dataset 2025-12-18 14:11:11 +01:00
Eric Coissac 547135c747 End of obilowmask 2025-12-03 11:49:07 +01:00
Eric Coissac 86e60aedd0 obicsv bug with stat on value map fields 2025-11-21 14:03:31 +01:00
Eric Coissac e65b2a5efe obimatrix bugs 2025-11-21 13:24:06 +01:00
Eric Coissac ccc827afd3 finalise obilowmask 2025-11-18 15:33:08 +01:00
Eric Coissac 4603d7973e implementation de obilowmask 2025-11-18 15:30:20 +01:00
Eric Coissac 2d7dc7d09d debug taxonomy core dump 2025-11-05 19:01:15 +01:00
Eric Coissac 0844dcc607 bug obimatrix 2025-10-28 13:57:31 +01:00
Eric Coissac 7f4ebe757e Bug obiuniq - don't clean the chunks 2025-10-28 13:50:22 +01:00
Eric Coissac d17a9520b9 work on obiclean chimera detection 2025-10-20 17:29:47 +02:00
Eric Coissac 29bf4ce871 add a feature to obimatrix adding obicsv option to obimatrix 2025-10-20 16:34:58 +02:00
Eric Coissac 82b6bb1ab6 correct a bug in func (worker SeqWorker) ChainWorkers(next SeqWorker) SeqWorker 2025-08-11 15:09:49 +02:00
Eric Coissac 6d204f6281 Patch the fastq detector 2025-08-08 10:23:03 -04:00
Eric Coissac 7a6d552450 Changes to be committed:
modified:   pkg/obioptions/version.go
2025-08-07 17:01:48 -04:00
Eric Coissac 730d448fc3 Allows for only one cpu and it should work 2025-08-06 16:09:25 -04:00
Eric Coissac 04f3af3e60 some renaming of functions 2025-08-06 15:54:50 -04:00
Eric Coissac ed28d3fb5b Adds a --u-to-t option 2025-07-07 15:35:26 +02:00
Eric Coissac 43b285587e Debug on taxonomy extraction and CSV conversion 2025-07-07 15:29:40 +02:00
Eric Coissac 235a7e202a Update obisummary to account new obiseq.StatsOnValues type 2025-06-19 17:21:30 +02:00
Eric Coissac 27fa984a63 Patch obimatrix accoring to the new type obiseq.StatsOnValues 2025-06-19 16:51:53 +02:00
Eric Coissac add9d89ccc Patch the Min and Max values of the expression language 2025-06-19 16:43:26 +02:00
Eric Coissac 9965370d85 Manage a lock on StatsOnValues 2025-06-17 16:46:11 +02:00
Eric Coissac 8a2bb1fe82 Changes to be committed:
modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/merge.go
2025-06-17 12:11:35 +02:00
Eric Coissac efc3f3af29 Patch a concurrent access problem 2025-06-17 12:05:42 +02:00
Eric Coissac 1c6ab1c559 Changes to be committed:
modified:   pkg/obingslibrary/multimatch.go
	modified:   pkg/obioptions/version.go
2025-06-17 09:06:42 +02:00
Eric Coissac 38dcd98d4a Patch the genbank parser automata 2025-06-17 08:52:45 +02:00
Eric Coissac 7b23985693 Add _ to allowed in taxid 2025-06-06 14:37:57 +02:00
Eric Coissac d31e677304 Patch a bug in obitag 2025-06-04 14:47:28 +02:00
Eric Coissac 6cb7a5a352 Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitaxonomy/main.go
	modified:   pkg/obiformats/csvtaxdump_read.go
	modified:   pkg/obiformats/ecopcr_read.go
	modified:   pkg/obiformats/ncbitaxdump_read.go
	modified:   pkg/obiformats/ncbitaxdump_readtar.go
	modified:   pkg/obiformats/newick_write.go
	modified:   pkg/obiformats/options.go
	modified:   pkg/obiformats/taxonomy_read.go
	modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obiiter/extract_taxonomy.go
	modified:   pkg/obioptions/options.go
	modified:   pkg/obioptions/version.go
	new file:   pkg/obiphylo/tree.go
	modified:   pkg/obiseq/biosequenceslice.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitools/obiconvert/sequence_reader.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitaxonomy/obitaxonomy.go
	modified:   pkg/obitools/obitaxonomy/options.go
	deleted:    sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac 3424d3057f Changes to be committed:
modified:   pkg/obiformats/ngsfilter_read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiutils/mimetypes.go
2025-05-14 14:53:25 +02:00
Eric Coissac f9324dd8f4 add min and max to the obitools expression language 2025-05-13 16:03:03 +02:00
Eric Coissac f1b9ac4a13 Update the expression language 2025-05-07 20:45:05 +02:00
Eric Coissac e065e2963b Update the install script 2025-05-01 11:45:46 +02:00
Eric Coissac 13ff892ac9 Patch type mismatch in apat C library 2025-04-23 16:14:10 +02:00
Eric Coissac c0ecaf90ab Add the --number option to obiannotate 2025-04-22 18:35:51 +02:00
Eric Coissac a57cfda675 Make the replace function of the eval language accepting regex 2025-04-10 15:17:15 +02:00
Eric Coissac 0aec5ba4df change the tests according to the corrections in obipairing 2025-04-04 17:10:17 +02:00
Eric Coissac 67e5b6ef24 Changes to be committed:
modified:   pkg/obioptions/version.go
2025-04-04 17:02:45 +02:00
Eric Coissac 3b1aa2869e Changes to be committed:
modified:   pkg/obioptions/version.go
2025-04-04 17:01:20 +02:00
Eric Coissac 7542e33010 Several bugs dicoverd during the doc writing 2025-04-04 16:59:27 +02:00
Eric Coissac 03b5ce9397 Patch a bug in obitag when some reference sequences have taxid absent from the taxonomy 2025-03-27 16:45:02 +01:00
Eric Coissac 2d52322876 Patch a bug in the obi2 annotation parser on map indexed by integers 2025-03-27 14:54:13 +01:00
Eric Coissac fd80249b85 Patch a bug in obitag when a taxon from the reference library is unknown in the taxonomy 2025-03-27 14:28:15 +01:00
Eric Coissac 5a3705b6bb Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands. 2025-03-25 16:44:46 +01:00
Eric Coissac 2ab6f67d58 Add a progress bar to chimera detection 2025-03-25 08:37:27 +01:00
Eric Coissac 8b379d30da Adds the --newick-output option to the obitaxonomy command 2025-03-14 14:24:12 +01:00