37 Commits

Author SHA1 Message Date
Eric Coissac
761e0dbed3 Implémentation d'un parseur GenBank utilisant rope pour réduire l'usage de mémoire
Ajout d'un parseur GenBank basé sur rope pour réduire l'usage de mémoire (RSS) et les allocations heap.

- Ajout de `gbRopeScanner` pour lire les lignes sans allocation heap
- Implémentation de `GenbankChunkParserRope` qui utilise rope au lieu de `Pack()`
- Modification de `_ParseGenbankFile` et `ReadGenbank` pour utiliser le nouveau parseur
- Réduction du RSS attendue de 57 GB à ~128 MB × workers
- Conservation de l'ancien parseur pour compatibilité et tests

Réduction significative des allocations (~50M) et temps sys, avec un temps user comparable ou meilleur.
2026-03-10 15:35:36 +01:00
Eric Coissac
f78543ee75 Refactor k-mer index building to use disk-based KmerSetGroupBuilder
Refactor k-mer index building to use the new disk-based KmerSetGroupBuilder instead of the old KmerSet and FrequencyFilter approaches. This change introduces a more efficient and scalable approach to building k-mer indices by using partitioned disk storage with streaming operations.

- Replace BuildKmerIndex and BuildFrequencyFilterIndex with KmerSetGroupBuilder
- Add support for frequency filtering via WithMinFrequency option
- Remove deprecated k-mer set persistence methods
- Update CLI to use new builder approach
- Add new disk-based k-mer operations (union, intersect, difference, quorum)
- Introduce KDI (K-mer Delta Index) file format for efficient storage
- Add K-way merge operations for combining sorted k-mer streams
- Update documentation and examples to reflect new API

This refactoring provides better memory usage, faster operations on large datasets, and more flexible k-mer set operations.
2026-02-10 06:49:31 +01:00
Eric Coissac
c0ae49ef92 Ajout d'obilowmask_ref au fichier .gitignore
Ajout du fichier obilowmask_ref dans le fichier .gitignore pour éviter qu'il ne soit suivi par Git.
2026-02-08 19:31:12 +01:00
Eric Coissac
db98ddb241 Fix super k-mer minimizer bijection and add validation test
This commit addresses a bug in the super k-mer implementation where the minimizer bijection property was not properly enforced. The fix ensures that:

1. All k-mers within a super k-mer share the same minimizer
2. Identical super k-mer sequences have the same minimizer

The changes include:

- Fixing the super k-mer iteration logic to properly validate the minimizer bijection property
- Adding a comprehensive test suite (TestSuperKmerMinimizerBijection) that validates the intrinsic property of super k-mers
- Updating the .gitignore file to properly track relevant files

This resolves issues where the same sequence could be associated with different minimizers, violating the super k-mer definition.
2026-02-08 13:47:33 +01:00
Eric Coissac
4603d7973e implementation de obilowmask 2025-11-18 15:30:20 +01:00
Eric Coissac
04f3af3e60 some renaming of functions 2025-08-06 15:54:50 -04:00
Eric Coissac
286e27d6ba patch the scienctific_name tag name to "scientific_name" 2025-03-05 14:22:12 +01:00
Eric Coissac
6245d7f684 Changes to be committed:
modified:   .gitignore
2025-02-24 15:47:45 +01:00
Eric Coissac
15a058cf63 with all the sample files for tests 2025-02-19 15:27:38 +01:00
Eric Coissac
f2e81adf95 Changes to be committed:
modified:   .gitignore
	deleted:    xxx.csv
2025-02-05 19:28:19 +01:00
Eric Coissac
0a567f621c small changes 2025-01-24 18:12:37 +01:00
Eric Coissac
d066bb6878 Changes to be committed:
modified:   .gitignore
	modified:   cmd/test/main.go
	modified:   pkg/obioptions/version.go
2025-01-09 07:24:41 +01:00
Eric Coissac
ccd3b06532 Merge branch 'master' into taxonomy 2024-12-20 20:06:57 +01:00
Eric Coissac
7884a74f9c Patch a bug in obitagpcr 2024-11-18 21:10:47 +01:00
Eric Coissac
36327c79c8 Changes to be committed:
modified:   .gitignore
	new file:   pkg/obitax/default_taxonomy.go
	modified:   pkg/obitax/taxon.go
	modified:   pkg/obitax/taxonnode.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitax/taxonslice.go
	modified:   pkg/obitools/obifind/iterator.go
	modified:   pkg/obitools/obifind/options.go
2024-11-16 10:01:49 +01:00
4127ddb26f .gitignore
Former-commit-id: e1dcb41970f7a5405005cda8a1bbd90798e8020d
2024-02-27 07:29:14 +01:00
f2f7b4574e update the geometric obitag
Former-commit-id: acd8fe1c8c1cf443098432d818397b0b5d02df33
2024-01-17 23:38:51 +01:00
6fca03227a Archive cleaning
Former-commit-id: ded6d9cb43e3ecdf6eb6965e73580ce30ab986c5
2024-01-04 14:27:33 +01:00
5b57139450 Reduce doc size
Former-commit-id: 6f92f375e9cf92159e769ce562071bb56a871819
2024-01-04 14:17:44 +01:00
c2533667b2 Tag a Fatal bug release 4.0.5
Former-commit-id: 10b27c6d3867756d3159ef22eefd75db3fab84d0
2023-08-29 18:32:00 +02:00
446ba06c63 edited .gitignore
Former-commit-id: 7090da08a2acd8d73d3c9e3aced387862bcc9822
2023-03-28 21:31:31 +07:00
d4b185b716 Adds some new dependencies
Former-commit-id: 2f31ea6f852651e1ffca1d9ce78b17bddd26f2bb
2023-03-07 11:12:39 +07:00
6c5fc8f65b Save change in various files
Former-commit-id: 428f8ee77c584b79cc2ef45eef2902c3e0754c77
2023-02-23 23:45:41 +01:00
f56363a100 Patch an embl/genbank parser error 2023-02-16 13:30:42 +01:00
3a151fc0a0 updated gitignire 2023-01-31 23:14:33 +01:00
f4daa7f97f Modify the gitignore 2022-11-17 12:05:11 +01:00
a71e65963b Modify the .gitignore 2022-08-21 17:53:51 +02:00
a18745a34d Modify the gitignore 2022-02-24 12:15:09 +01:00
f5b278f5ec edit .gitignore 2022-02-24 07:09:35 +01:00
3586ecc483 second version of obidistribute and a first buggy version of obiuniq 2022-02-15 00:47:02 +01:00
b931321ba1 Adds the hash option to obidistribute 2022-02-14 09:12:57 +01:00
b193c3edfe adds the obifind command to gitignore 2022-02-07 11:52:56 +01:00
e9cdfd7e03 Make subseq method dealing with qualities 2022-02-01 18:49:32 +01:00
703eb62819 Adds elements to .gitignore 2022-01-18 13:10:56 +01:00
a0ba77792a Adds fasta and fastq file to the main gitignore file 2022-01-14 15:20:11 +01:00
bfa724dca3 adds the bin directory to the gitignore file 2022-01-14 15:18:36 +01:00
b9b9c0f179 Patch module name from oa2 to obitools 2022-01-13 23:43:01 +01:00