obitools4

mirror of https://github.com/metabarcoding/obitools4.git synced 2026-03-25 05:20:52 +00:00

Author	SHA1	Message	Date
Eric Coissac	1342c83db6	Use NewBioSequenceOwning to avoid unnecessary sequence copying Replace NewBioSequence with NewBioSequenceOwning in genbank_read.go to take ownership of sequence slices without copying, improving performance. Update biosequence.go to add the new TakeSequence method and NewBioSequenceOwning constructor.	2026-03-10 15:51:35 +01:00
Eric Coissac	761e0dbed3	Implémentation d'un parseur GenBank utilisant rope pour réduire l'usage de mémoire Ajout d'un parseur GenBank basé sur rope pour réduire l'usage de mémoire (RSS) et les allocations heap. - Ajout de `gbRopeScanner` pour lire les lignes sans allocation heap - Implémentation de `GenbankChunkParserRope` qui utilise rope au lieu de `Pack()` - Modification de `_ParseGenbankFile` et `ReadGenbank` pour utiliser le nouveau parseur - Réduction du RSS attendue de 57 GB à ~128 MB × workers - Conservation de l'ancien parseur pour compatibilité et tests Réduction significative des allocations (~50M) et temps sys, avec un temps user comparable ou meilleur.	2026-03-10 15:35:36 +01:00
Eric Coissac	a7ea47624b	Optimisation du parsing des grandes séquences Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.	2026-03-10 14:20:21 +01:00
Eric Coissac	c57e788459	Fix GenBank parsing and add release notes script This commit fixes an issue in the GenBank parser where empty parts were being included in the parsed data. It also introduces a new script `release_notes.sh` to automate the generation of GitHub-compatible release notes for OBITools4 versions, including support for LLM summarization and various output modes.	2026-02-20 11:37:51 +01:00
Eric Coissac	8d53d253d4	Add a reading option on readers to convet U to T	2025-07-07 15:29:07 +02:00
Eric Coissac	38dcd98d4a	Patch the genbank parser automata	2025-06-17 08:52:45 +02:00
Eric Coissac	3a1cf4fe97	Accelerate the speed of very long fasta sequences, and more generaly of every format	2025-03-12 13:29:41 +01:00
Eric Coissac	3137c1f841	Adds the ability to read gzip-tar file for the taxonomy dump	2025-01-24 11:47:59 +01:00
Eric Coissac	00b0edc15a	refactoring of the file chunck writing	2024-11-29 18:15:03 +01:00
Eric Coissac	3f57935328	Adjust the size of the genbank and embl buffer size	2024-08-05 11:32:37 +02:00
Eric Coissac	886b5d9a96	Optimize memory for readers and writers	2024-08-05 10:48:28 +02:00
Eric Coissac	1b1cd41fd3	Add some code refactoring from the blackboard branch	2024-08-02 12:35:46 +02:00
Eric Coissac	e40d0bfbe7	Debug fasta and fastq writer when the first sequence is hudge Former-commit-id: d208ff838abb7e19e117067f6243298492d60f14	2024-06-26 18:39:42 +02:00
Eric Coissac	e6b87ecd02	Reduce memory allocation events Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2	2024-06-22 21:01:53 +02:00
Eric Coissac	5b98393a68	Refactor sequence file reading Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851	2024-05-01 00:50:23 +02:00
Eric Coissac	8a2bbd1c3b	Patch a bug in the genbank reader when reading CONTIG entries Former-commit-id: dfe1433fbb68a79d59a3ee45e7e5b58c1599dad6	2024-03-11 10:53:25 +01:00
Eric Coissac	c9fe6f6ebf	Make some correction on genbank/embl parser Former-commit-id: fb2ebb351f61d78432bb9648d0a509b6557651a2	2024-02-27 07:28:56 +01:00
Eric Coissac	1542ce4c63	Work on EMBL and Genbank parser efficienct Former-commit-id: 309cc9ce4eea4c8085d7d4451a66a81710532f07	2024-02-20 13:23:07 +01:00
Eric Coissac	54ec09037e	Patch a genbank & embl parsing error. Former-commit-id: 060e80f42d176e6982e63d1128993fbcb4ad395f	2024-02-16 15:20:37 +01:00
Eric Coissac	8d77cc4133	Change path of the obitools pkg Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65	2023-11-29 12:14:37 +01:00
Eric Coissac	6a6a6f6f2c	Correctly handle empty files Former-commit-id: d166aa352ce4bf32739ddc2f7d1c9967918822fd	2023-10-16 15:34:06 +02:00
Eric Coissac	62b57f4ede	A go implementation of the fasta reader Former-commit-id: 603592c4761fb0722e9e0501d78de1bd3ba238fa	2023-09-01 09:30:12 +02:00
Eric Coissac	e7b9ba3f30	Limit allocation during genbank parsing Former-commit-id: eee3c1fa7ffb79943109ee32dbf21e78bf11b14f	2023-03-28 22:42:58 +07:00
Eric Coissac	988ae79989	Optimize memory allocation of the apat algorithms Former-commit-id: 5010c5a666b322715b3b81c1078d325e1f647ede	2023-03-28 19:37:05 +07:00
Eric Coissac	bc82422bc5	Reduce redundante call to bytes.ToLower and substitute the last call by an home made version doing the conversion in place Former-commit-id: d9ea22f649d97be352f8dbb37acc1495df830118	2023-03-28 11:43:04 +07:00
Eric Coissac	a33e471b39	First attempt for obiconsensus... The graph traversing algorithm is too simple Former-commit-id: 0456e6c7fd55d6d0fcf9856c40386b976b912cba	2023-03-27 19:51:10 +07:00
Eric Coissac	b3922c3896	Produce less weird crash on non existing files Former-commit-id: 74bb27bd53c685be530632994bd2ba24c1f362e1	2023-03-07 17:34:25 +07:00
Eric Coissac	d88de15cdc	Refactoring codes for removing buffer size options. An some other changes... Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce	2023-03-07 11:12:13 +07:00
Eric Coissac	8458c0cd8b	Patch a bug in the genbank reader for the sequence longer than 10kb.	2023-02-17 10:54:03 +01:00
Eric Coissac	f97f92df72	rename the iterator class	2023-01-22 22:04:17 +01:00
Eric Coissac	20b16c0ba1	Force sequence reading to produce lowercase sequences. Adds two columns to the obiclean ratio csv file	2022-11-22 15:06:09 +01:00
Eric Coissac	09fc426b67	Refactoring related to iterators	2022-11-16 17:13:03 +01:00
Eric Coissac	6f853da9df	Remove single sequence ierators. Only batch iterators persist	2022-11-16 10:58:59 +01:00
Eric Coissac	5a81525e46	Adds a genbank parser	2022-08-23 11:04:57 +02:00

34 Commits