Eric Coissac
a7ea47624b
Optimisation du parsing des grandes séquences
...
Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.
2026-03-10 14:20:21 +01:00
Eric Coissac
8d53d253d4
Add a reading option on readers to convet U to T
2025-07-07 15:29:07 +02:00
Eric Coissac
3a1cf4fe97
Accelerate the speed of very long fasta sequences, and more generaly of every format
2025-03-12 13:29:41 +01:00
Eric Coissac
3137c1f841
Adds the ability to read gzip-tar file for the taxonomy dump
2025-01-24 11:47:59 +01:00
Eric Coissac
00b0edc15a
refactoring of the file chunck writing
2024-11-29 18:15:03 +01:00
Eric Coissac
3f57935328
Adjust the size of the genbank and embl buffer size
2024-08-05 11:32:37 +02:00
Eric Coissac
886b5d9a96
Optimize memory for readers and writers
2024-08-05 10:48:28 +02:00
Eric Coissac
1b1cd41fd3
Add some code refactoring from the blackboard branch
2024-08-02 12:35:46 +02:00
Eric Coissac
e6b87ecd02
Reduce memory allocation events
...
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
55ce36f329
Update of obipcr and homogenization of logging
...
Former-commit-id: 46abf47c19ace5248042c02cf1f81d9f6c12eb10
2024-05-16 15:18:30 +02:00
Eric Coissac
5b98393a68
Refactor sequence file reading
...
Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851
2024-05-01 00:50:23 +02:00
c9fe6f6ebf
Make some correction on genbank/embl parser
...
Former-commit-id: fb2ebb351f61d78432bb9648d0a509b6557651a2
2024-02-27 07:28:56 +01:00
1542ce4c63
Work on EMBL and Genbank parser efficienct
...
Former-commit-id: 309cc9ce4eea4c8085d7d4451a66a81710532f07
2024-02-20 13:23:07 +01:00
54ec09037e
Patch a genbank & embl parsing error.
...
Former-commit-id: 060e80f42d176e6982e63d1128993fbcb4ad395f
2024-02-16 15:20:37 +01:00
6cc5c44a32
Correct the bug in embl reader
...
Former-commit-id: 579d397ca16e8c4cf2f8ba01e503e62b2fffa06f
2024-01-31 15:50:14 +01:00
23758b00f6
Patch a bug in the embl reader and adds some doc
...
Former-commit-id: 9b5f75fb14bcc3043da1647055279987a295d271
2024-01-31 15:43:02 +01:00
8d77cc4133
Change path of the obitools pkg
...
Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65
2023-11-29 12:14:37 +01:00
8f96517f3c
small changes
...
Former-commit-id: 1fee30445f03ff627dab1c335e75c3f278621f6e
2023-11-07 21:20:45 +02:00
6a6a6f6f2c
Correctly handle empty files
...
Former-commit-id: d166aa352ce4bf32739ddc2f7d1c9967918822fd
2023-10-16 15:34:06 +02:00
e8c55a2b6b
optimize sequence readers and patch a bug in the format guesser
...
Former-commit-id: 9dce1e96c57ae9a88c26fac5c8e1bdcdc2c0c7a5
2023-10-13 21:52:57 +02:00
62b57f4ede
A go implementation of the fasta reader
...
Former-commit-id: 603592c4761fb0722e9e0501d78de1bd3ba238fa
2023-09-01 09:30:12 +02:00
3f69fa41d6
Patch a bug for multiple amplicon per sequence.
...
Former-commit-id: b252d2de8e1a85d65c2951aa1958ee038e35741d
2023-03-31 15:10:25 +02:00
bc82422bc5
Reduce redundante call to bytes.ToLower and substitute the last call by an home made version doing the conversion in place
...
Former-commit-id: d9ea22f649d97be352f8dbb37acc1495df830118
2023-03-28 11:43:04 +07:00
a33e471b39
First attempt for obiconsensus... The graph traversing algorithm is too simple
...
Former-commit-id: 0456e6c7fd55d6d0fcf9856c40386b976b912cba
2023-03-27 19:51:10 +07:00
d88de15cdc
Refactoring codes for removing buffer size options. An some other changes...
...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
2023-03-07 11:12:13 +07:00
8458c0cd8b
Patch a bug in the genbank reader for the sequence longer than 10kb.
2023-02-17 10:54:03 +01:00
f56363a100
Patch an embl/genbank parser error
2023-02-16 13:30:42 +01:00
f97f92df72
rename the iterator class
2023-01-22 22:04:17 +01:00
20b16c0ba1
Force sequence reading to produce lowercase sequences.
...
Adds two columns to the obiclean ratio csv file
2022-11-22 15:06:09 +01:00
09fc426b67
Refactoring related to iterators
2022-11-16 17:13:03 +01:00
6f853da9df
Remove single sequence ierators. Only batch iterators persist
2022-11-16 10:58:59 +01:00
9677f531c4
Improved performance and ability to read very long sequences.
2022-08-21 13:38:13 +02:00
abcf02e488
Start to use leveled log
2022-02-24 12:14:52 +01:00
eaf65fbcce
Some code refactoring, a new version of obiuniq more efficient in memory and a first make file allowing to build obitools
2022-02-24 07:08:40 +01:00
2e7c1834b0
Big change iin the data model, and a first version of obiuniq
2022-02-21 19:00:23 +01:00
4551df08b1
Adds a reader for NGS filter files and change some API for the apat library
2022-01-18 13:09:32 +01:00
576a9f4d2d
A global version of a Slice pool
2022-01-16 00:21:42 +01:00
ef66ca4972
Code refactoring
2022-01-14 17:32:12 +01:00
ff40222902
Code reefactoring
2022-01-14 16:10:19 +01:00
b9b9c0f179
Patch module name from oa2 to obitools
2022-01-13 23:43:01 +01:00
f53bf1b804
First commit
2022-01-13 23:27:39 +01:00