Eric Coissac
0580611031
Implémentation des superkmers canoniques et nettoyage du parsing GenBank
...
Ajout de la fonction IterCanonicalSuperKmers dans superkmer_iter.go pour implémenter les superkmers canoniques selon le document d'architecture.
Corrections dans genbank_read.go :
- Nettoyage des lignes de données avec strings.TrimSpace
- Augmentation du nombre de parties extraites avec SplitN à 7
- Début de la boucle à l'indice 1 au lieu de 0 pour ignorer le premier élément vide
Création du fichier Canonical-superkmers.md pour documenter l'implémentation.
2026-02-19 18:30:54 +01:00
Eric Coissac
8d53d253d4
Add a reading option on readers to convet U to T
2025-07-07 15:29:07 +02:00
Eric Coissac
38dcd98d4a
Patch the genbank parser automata
2025-06-17 08:52:45 +02:00
Eric Coissac
3a1cf4fe97
Accelerate the speed of very long fasta sequences, and more generaly of every format
2025-03-12 13:29:41 +01:00
Eric Coissac
3137c1f841
Adds the ability to read gzip-tar file for the taxonomy dump
2025-01-24 11:47:59 +01:00
Eric Coissac
00b0edc15a
refactoring of the file chunck writing
2024-11-29 18:15:03 +01:00
Eric Coissac
3f57935328
Adjust the size of the genbank and embl buffer size
2024-08-05 11:32:37 +02:00
Eric Coissac
886b5d9a96
Optimize memory for readers and writers
2024-08-05 10:48:28 +02:00
Eric Coissac
1b1cd41fd3
Add some code refactoring from the blackboard branch
2024-08-02 12:35:46 +02:00
Eric Coissac
e40d0bfbe7
Debug fasta and fastq writer when the first sequence is hudge
...
Former-commit-id: d208ff838abb7e19e117067f6243298492d60f14
2024-06-26 18:39:42 +02:00
Eric Coissac
e6b87ecd02
Reduce memory allocation events
...
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
5b98393a68
Refactor sequence file reading
...
Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851
2024-05-01 00:50:23 +02:00
8a2bbd1c3b
Patch a bug in the genbank reader when reading CONTIG entries
...
Former-commit-id: dfe1433fbb68a79d59a3ee45e7e5b58c1599dad6
2024-03-11 10:53:25 +01:00
c9fe6f6ebf
Make some correction on genbank/embl parser
...
Former-commit-id: fb2ebb351f61d78432bb9648d0a509b6557651a2
2024-02-27 07:28:56 +01:00
1542ce4c63
Work on EMBL and Genbank parser efficienct
...
Former-commit-id: 309cc9ce4eea4c8085d7d4451a66a81710532f07
2024-02-20 13:23:07 +01:00
54ec09037e
Patch a genbank & embl parsing error.
...
Former-commit-id: 060e80f42d176e6982e63d1128993fbcb4ad395f
2024-02-16 15:20:37 +01:00
8d77cc4133
Change path of the obitools pkg
...
Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65
2023-11-29 12:14:37 +01:00
6a6a6f6f2c
Correctly handle empty files
...
Former-commit-id: d166aa352ce4bf32739ddc2f7d1c9967918822fd
2023-10-16 15:34:06 +02:00
62b57f4ede
A go implementation of the fasta reader
...
Former-commit-id: 603592c4761fb0722e9e0501d78de1bd3ba238fa
2023-09-01 09:30:12 +02:00
e7b9ba3f30
Limit allocation during genbank parsing
...
Former-commit-id: eee3c1fa7ffb79943109ee32dbf21e78bf11b14f
2023-03-28 22:42:58 +07:00
988ae79989
Optimize memory allocation of the apat algorithms
...
Former-commit-id: 5010c5a666b322715b3b81c1078d325e1f647ede
2023-03-28 19:37:05 +07:00
bc82422bc5
Reduce redundante call to bytes.ToLower and substitute the last call by an home made version doing the conversion in place
...
Former-commit-id: d9ea22f649d97be352f8dbb37acc1495df830118
2023-03-28 11:43:04 +07:00
a33e471b39
First attempt for obiconsensus... The graph traversing algorithm is too simple
...
Former-commit-id: 0456e6c7fd55d6d0fcf9856c40386b976b912cba
2023-03-27 19:51:10 +07:00
b3922c3896
Produce less weird crash on non existing files
...
Former-commit-id: 74bb27bd53c685be530632994bd2ba24c1f362e1
2023-03-07 17:34:25 +07:00
d88de15cdc
Refactoring codes for removing buffer size options. An some other changes...
...
Former-commit-id: 10b57cc1a27446ade3c444217341e9651e89cdce
2023-03-07 11:12:13 +07:00
8458c0cd8b
Patch a bug in the genbank reader for the sequence longer than 10kb.
2023-02-17 10:54:03 +01:00
f97f92df72
rename the iterator class
2023-01-22 22:04:17 +01:00
20b16c0ba1
Force sequence reading to produce lowercase sequences.
...
Adds two columns to the obiclean ratio csv file
2022-11-22 15:06:09 +01:00
09fc426b67
Refactoring related to iterators
2022-11-16 17:13:03 +01:00
6f853da9df
Remove single sequence ierators. Only batch iterators persist
2022-11-16 10:58:59 +01:00
5a81525e46
Adds a genbank parser
2022-08-23 11:04:57 +02:00