Commit Graph

41 Commits

Author SHA1 Message Date
Eric Coissac
a7ea47624b Optimisation du parsing des grandes séquences
Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.
2026-03-10 14:20:21 +01:00
Eric Coissac
8d53d253d4 Add a reading option on readers to convet U to T 2025-07-07 15:29:07 +02:00
Eric Coissac
3a1cf4fe97 Accelerate the speed of very long fasta sequences, and more generaly of every format 2025-03-12 13:29:41 +01:00
Eric Coissac
3137c1f841 Adds the ability to read gzip-tar file for the taxonomy dump 2025-01-24 11:47:59 +01:00
Eric Coissac
00b0edc15a refactoring of the file chunck writing 2024-11-29 18:15:03 +01:00
Eric Coissac
3f57935328 Adjust the size of the genbank and embl buffer size 2024-08-05 11:32:37 +02:00
Eric Coissac
886b5d9a96 Optimize memory for readers and writers 2024-08-05 10:48:28 +02:00
Eric Coissac
1b1cd41fd3 Add some code refactoring from the blackboard branch 2024-08-02 12:35:46 +02:00
Eric Coissac
e6b87ecd02 Reduce memory allocation events
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
55ce36f329 Update of obipcr and homogenization of logging
Former-commit-id: 46abf47c19ace5248042c02cf1f81d9f6c12eb10
2024-05-16 15:18:30 +02:00
Eric Coissac
5b98393a68 Refactor sequence file reading
Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851
2024-05-01 00:50:23 +02:00
c9fe6f6ebf Make some correction on genbank/embl parser
Former-commit-id: fb2ebb351f61d78432bb9648d0a509b6557651a2
2024-02-27 07:28:56 +01:00
1542ce4c63 Work on EMBL and Genbank parser efficienct
Former-commit-id: 309cc9ce4eea4c8085d7d4451a66a81710532f07
2024-02-20 13:23:07 +01:00
54ec09037e Patch a genbank & embl parsing error.
Former-commit-id: 060e80f42d176e6982e63d1128993fbcb4ad395f
2024-02-16 15:20:37 +01:00
6cc5c44a32 Correct the bug in embl reader
Former-commit-id: 579d397ca16e8c4cf2f8ba01e503e62b2fffa06f
2024-01-31 15:50:14 +01:00
23758b00f6 Patch a bug in the embl reader and adds some doc
Former-commit-id: 9b5f75fb14bcc3043da1647055279987a295d271
2024-01-31 15:43:02 +01:00
8d77cc4133 Change path of the obitools pkg
Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65
2023-11-29 12:14:37 +01:00
8f96517f3c small changes
Former-commit-id: 1fee30445f03ff627dab1c335e75c3f278621f6e
2023-11-07 21:20:45 +02:00
6a6a6f6f2c Correctly handle empty files
Former-commit-id: d166aa352ce4bf32739ddc2f7d1c9967918822fd
2023-10-16 15:34:06 +02:00
e8c55a2b6b optimize sequence readers and patch a bug in the format guesser
Former-commit-id: 9dce1e96c57ae9a88c26fac5c8e1bdcdc2c0c7a5
2023-10-13 21:52:57 +02:00
62b57f4ede A go implementation of the fasta reader
Former-commit-id: 603592c4761fb0722e9e0501d78de1bd3ba238fa
2023-09-01 09:30:12 +02:00
3f69fa41d6 Patch a bug for multiple amplicon per sequence.
Former-commit-id: b252d2de8e1a85d65c2951aa1958ee038e35741d
2023-03-31 15:10:25 +02:00
bc82422bc5 Reduce redundante call to bytes.ToLower and substitute the last call by an home made version doing the conversion in place
Former-commit-id: d9ea22f649d97be352f8dbb37acc1495df830118
2023-03-28 11:43:04 +07:00
a33e471b39 First attempt for obiconsensus... The graph traversing algorithm is too simple
Former-commit-id: 0456e6c7fd55d6d0fcf9856c40386b976b912cba
2023-03-27 19:51:10 +07:00
d88de15cdc Refactoring codes for removing buffer size options. An some other changes...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
2023-03-07 11:12:13 +07:00
8458c0cd8b Patch a bug in the genbank reader for the sequence longer than 10kb. 2023-02-17 10:54:03 +01:00
f56363a100 Patch an embl/genbank parser error 2023-02-16 13:30:42 +01:00
f97f92df72 rename the iterator class 2023-01-22 22:04:17 +01:00
20b16c0ba1 Force sequence reading to produce lowercase sequences.
Adds two columns to the obiclean ratio csv file
2022-11-22 15:06:09 +01:00
09fc426b67 Refactoring related to iterators 2022-11-16 17:13:03 +01:00
6f853da9df Remove single sequence ierators. Only batch iterators persist 2022-11-16 10:58:59 +01:00
9677f531c4 Improved performance and ability to read very long sequences. 2022-08-21 13:38:13 +02:00
abcf02e488 Start to use leveled log 2022-02-24 12:14:52 +01:00
eaf65fbcce Some code refactoring, a new version of obiuniq more efficient in memory and a first make file allowing to build obitools 2022-02-24 07:08:40 +01:00
2e7c1834b0 Big change iin the data model, and a first version of obiuniq 2022-02-21 19:00:23 +01:00
4551df08b1 Adds a reader for NGS filter files and change some API for the apat library 2022-01-18 13:09:32 +01:00
576a9f4d2d A global version of a Slice pool 2022-01-16 00:21:42 +01:00
ef66ca4972 Code refactoring 2022-01-14 17:32:12 +01:00
ff40222902 Code reefactoring 2022-01-14 16:10:19 +01:00
b9b9c0f179 Patch module name from oa2 to obitools 2022-01-13 23:43:01 +01:00
f53bf1b804 First commit 2022-01-13 23:27:39 +01:00