Commit Graph

159 Commits

Author SHA1 Message Date
Eric Coissac 761e0dbed3 Implémentation d'un parseur GenBank utilisant rope pour réduire l'usage de mémoire
Ajout d'un parseur GenBank basé sur rope pour réduire l'usage de mémoire (RSS) et les allocations heap.

- Ajout de `gbRopeScanner` pour lire les lignes sans allocation heap
- Implémentation de `GenbankChunkParserRope` qui utilise rope au lieu de `Pack()`
- Modification de `_ParseGenbankFile` et `ReadGenbank` pour utiliser le nouveau parseur
- Réduction du RSS attendue de 57 GB à ~128 MB × workers
- Conservation de l'ancien parseur pour compatibilité et tests

Réduction significative des allocations (~50M) et temps sys, avec un temps user comparable ou meilleur.
2026-03-10 15:35:36 +01:00
Eric Coissac a7ea47624b Optimisation du parsing des grandes séquences
Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.
2026-03-10 14:20:21 +01:00
Eric Coissac c57e788459 Fix GenBank parsing and add release notes script
This commit fixes an issue in the GenBank parser where empty parts were being included in the parsed data. It also introduces a new script `release_notes.sh` to automate the generation of GitHub-compatible release notes for OBITools4 versions, including support for LLM summarization and various output modes.
2026-02-20 11:37:51 +01:00
Eric Coissac cef29005a5 debug url reading 2025-11-18 15:30:20 +01:00
Eric Coissac 6d204f6281 Patch the fastq detector 2025-08-08 10:23:03 -04:00
Eric Coissac 43b285587e Debug on taxonomy extraction and CSV conversion 2025-07-07 15:29:40 +02:00
Eric Coissac 8d53d253d4 Add a reading option on readers to convet U to T 2025-07-07 15:29:07 +02:00
Eric Coissac 9965370d85 Manage a lock on StatsOnValues 2025-06-17 16:46:11 +02:00
Eric Coissac 38dcd98d4a Patch the genbank parser automata 2025-06-17 08:52:45 +02:00
Eric Coissac 6cb7a5a352 Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitaxonomy/main.go
	modified:   pkg/obiformats/csvtaxdump_read.go
	modified:   pkg/obiformats/ecopcr_read.go
	modified:   pkg/obiformats/ncbitaxdump_read.go
	modified:   pkg/obiformats/ncbitaxdump_readtar.go
	modified:   pkg/obiformats/newick_write.go
	modified:   pkg/obiformats/options.go
	modified:   pkg/obiformats/taxonomy_read.go
	modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obiiter/extract_taxonomy.go
	modified:   pkg/obioptions/options.go
	modified:   pkg/obioptions/version.go
	new file:   pkg/obiphylo/tree.go
	modified:   pkg/obiseq/biosequenceslice.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitools/obiconvert/sequence_reader.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitaxonomy/obitaxonomy.go
	modified:   pkg/obitools/obitaxonomy/options.go
	deleted:    sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac 3424d3057f Changes to be committed:
modified:   pkg/obiformats/ngsfilter_read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiutils/mimetypes.go
2025-05-14 14:53:25 +02:00
Eric Coissac 2d52322876 Patch a bug in the obi2 annotation parser on map indexed by integers 2025-03-27 14:54:13 +01:00
Eric Coissac 5a3705b6bb Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands. 2025-03-25 16:44:46 +01:00
Eric Coissac 8448783499 Make sequence files recognized as a taxonomy 2025-03-14 14:22:22 +01:00
Eric Coissac 3a1cf4fe97 Accelerate the speed of very long fasta sequences, and more generaly of every format 2025-03-12 13:29:41 +01:00
Eric Coissac 0339e4dffa Patch size limite of the filetype guesser 2025-03-08 07:34:02 +01:00
Eric Coissac 0067152c2b Patch the production of the ratio file 2025-02-27 10:19:39 +01:00
Eric Coissac 4774438644 Changes to be committed:
modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/taxonomy_methods.go
2025-02-12 08:40:38 +01:00
Eric Coissac 6a8061cc4f Add managment of the taxonomy alias politic 2025-02-10 14:05:47 +01:00
Eric Coissac 0df082da06 Adds possibility to extract a taxonomy from taxonomic path included in sequence files 2025-01-30 11:18:21 +01:00
Eric Coissac 7c4042df6b introduce obidefault 2025-01-27 17:12:45 +01:00
Eric Coissac 9acb4a85a8 Refactoring of the default values 2025-01-24 18:09:59 +01:00
Eric Coissac 3137c1f841 Adds the ability to read gzip-tar file for the taxonomy dump 2025-01-24 11:47:59 +01:00
Eric Coissac 4fe0db63ff Patch CSV reader to use the new taxonomy system 2024-12-20 21:30:00 +01:00
Eric Coissac ccd3b06532 Merge branch 'master' into taxonomy 2024-12-20 20:06:57 +01:00
Eric Coissac 5d0f996625 Patch a small bug on json write 2024-12-20 19:42:03 +01:00
Eric Coissac 795df34d1a Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitag2/main.go
	modified:   go.mod
	modified:   go.sum
	modified:   pkg/obiformats/ncbitaxdump/read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/attributes.go
	modified:   pkg/obiseq/taxonomy_lca.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obiseq/taxonomy_predicate.go
	modified:   pkg/obitax/inner.go
	modified:   pkg/obitax/lca.go
	new file:   pkg/obitax/taxid.go
	modified:   pkg/obitax/taxon.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonslice.go
	modified:   pkg/obitools/obicleandb/obicleandb.go
	modified:   pkg/obitools/obigrep/options.go
	modified:   pkg/obitools/obilandmark/obilandmark.go
	modified:   pkg/obitools/obilandmark/options.go
	modified:   pkg/obitools/obirefidx/famlilyindexing.go
	modified:   pkg/obitools/obirefidx/geomindexing.go
	modified:   pkg/obitools/obirefidx/obirefidx.go
	modified:   pkg/obitools/obirefidx/options.go
	modified:   pkg/obitools/obitag/obigeomtag.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitag/options.go
	modified:   pkg/obiutils/strings.go
2024-12-19 13:36:59 +01:00
Eric Coissac f41a6fbb60 Patch a small bug on json write 2024-11-29 18:39:18 +01:00
Eric Coissac 00b0edc15a refactoring of the file chunck writing 2024-11-29 18:15:03 +01:00
Eric Coissac 40fb4e9767 reduce the memory impact of obiuniq. 2024-11-27 13:30:16 +01:00
Eric Coissac 3d06978808 a functional new version of obifind 2024-11-24 19:33:24 +01:00
Eric Coissac 7633fc4d23 update documentation 2024-11-16 06:00:27 +01:00
Eric Coissac 03f4e88a17 Fisrt functional version 2024-11-14 19:10:23 +01:00
Eric Coissac 3e00d39d47 In obimultiplex, patch a bug when no tag are associated to a primer. 2024-10-22 14:12:20 +02:00
Eric Coissac 9e8a7fd9be Patch a bug in fastq reader 2024-10-20 16:07:43 +02:00
Eric Coissac 05bf2bfd6c Add option related to agrep match on obigrep and obiannotate 2024-09-09 16:52:13 +02:00
Eric Coissac 65ae82622e correction of several small bugs 2024-09-03 06:08:07 -03:00
Eric Coissac 31bfc88eb9 Patch a bug on writing to stdout, and add clearer error on openning data files 2024-08-13 09:45:28 +02:00
Eric Coissac 3f57935328 Adjust the size of the genbank and embl buffer size 2024-08-05 11:32:37 +02:00
Eric Coissac 886b5d9a96 Optimize memory for readers and writers 2024-08-05 10:48:28 +02:00
Eric Coissac 1b1cd41fd3 Add some code refactoring from the blackboard branch 2024-08-02 12:35:46 +02:00
Eric Coissac 67665a6b40 Xprize update
Former-commit-id: d38919a897961e4d40da3b844057c3fb94fdb6d7
2024-07-25 18:09:03 -04:00
Eric Coissac bd855c4965 Adds CSV as an input format
Former-commit-id: a365bb6947064adc2709d66df05fa54c6fe47fad
2024-07-03 21:04:27 +02:00
Eric Coissac 154753de90 Same bug but for fastq sequence writing only
Former-commit-id: 86d208fe66828da9943c559df80ff095b07eaf7a
2024-06-26 18:46:47 +02:00
Eric Coissac e40d0bfbe7 Debug fasta and fastq writer when the first sequence is hudge
Former-commit-id: d208ff838abb7e19e117067f6243298492d60f14
2024-06-26 18:39:42 +02:00
Eric Coissac c1f03cb1f6 Switch to faster json library go-json and sonic
Former-commit-id: ab9b4723f1dcf79fe5c073fff4d86f4f6969edfd
2024-06-23 00:36:08 +02:00
Eric Coissac 93f9dcb95f Reducing memory allocation events
Former-commit-id: c94e79ba116464504580fc397270ead154063971
2024-06-22 22:32:31 +02:00
Eric Coissac e6b87ecd02 Reduce memory allocation events
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac 54a138196c Patch a bug in fasta and fastq reading
Former-commit-id: bcaa264b4c4a7c67617eb909b199176bf09913db
2024-06-21 14:28:57 +02:00
Eric Coissac 818ce87bab Patch some bugs in writing files
Former-commit-id: 612868a281dc0ecf4e6c5776973735e5c71bd517
2024-06-19 13:15:30 +02:00