Commit Graph

27 Commits

Author SHA1 Message Date
Eric Coissac
3d2e205722 Refactor rope scanner and add FASTQ rope parser
This commit refactors the rope scanner implementation by renaming gbRopeScanner to ropeScanner and extracting the common functionality into a new file. It also introduces a new FastqChunkParserRope function that parses FASTQ chunks directly from a rope without Pack(), enabling more efficient memory usage. The existing parsers are updated to use the new rope-based parser when available. The BioSequence type is enhanced with a TakeQualities method for more efficient quality data handling.
2026-03-10 16:47:03 +01:00
Eric Coissac
a7ea47624b Optimisation du parsing des grandes séquences
Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.
2026-03-10 14:20:21 +01:00
Eric Coissac
8d53d253d4 Add a reading option on readers to convet U to T 2025-07-07 15:29:07 +02:00
Eric Coissac
5a3705b6bb Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands. 2025-03-25 16:44:46 +01:00
Eric Coissac
3a1cf4fe97 Accelerate the speed of very long fasta sequences, and more generaly of every format 2025-03-12 13:29:41 +01:00
Eric Coissac
9acb4a85a8 Refactoring of the default values 2025-01-24 18:09:59 +01:00
Eric Coissac
3137c1f841 Adds the ability to read gzip-tar file for the taxonomy dump 2025-01-24 11:47:59 +01:00
Eric Coissac
ccd3b06532 Merge branch 'master' into taxonomy 2024-12-20 20:06:57 +01:00
Eric Coissac
00b0edc15a refactoring of the file chunck writing 2024-11-29 18:15:03 +01:00
Eric Coissac
40fb4e9767 reduce the memory impact of obiuniq. 2024-11-27 13:30:16 +01:00
Eric Coissac
9e8a7fd9be Patch a bug in fastq reader 2024-10-20 16:07:43 +02:00
Eric Coissac
886b5d9a96 Optimize memory for readers and writers 2024-08-05 10:48:28 +02:00
Eric Coissac
1b1cd41fd3 Add some code refactoring from the blackboard branch 2024-08-02 12:35:46 +02:00
Eric Coissac
e6b87ecd02 Reduce memory allocation events
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
54a138196c Patch a bug in fasta and fastq reading
Former-commit-id: bcaa264b4c4a7c67617eb909b199176bf09913db
2024-06-21 14:28:57 +02:00
Eric Coissac
aa42df326a Correct a bug in the fastq reader affecting the quality of the last record of each chunk
Former-commit-id: b842d60af9c2f1f971946d99999d13cfc15793b3
2024-06-04 11:57:16 +02:00
Eric Coissac
dd9307a4cd Swich to the system min and max functions and remove the version from obiutils
Former-commit-id: 8c4558921b0d0c266b070f16e83813de6e6d4a0f
2024-05-30 08:27:24 +02:00
Eric Coissac
55ce36f329 Update of obipcr and homogenization of logging
Former-commit-id: 46abf47c19ace5248042c02cf1f81d9f6c12eb10
2024-05-16 15:18:30 +02:00
Eric Coissac
9e63013bc2 Correction on obiformat of bug leading to partial parsing and add godocs
Former-commit-id: b27105355f1a330eedf6eaa72c8ac94f06806c28
2024-05-07 10:54:12 +02:00
Eric Coissac
5b98393a68 Refactor sequence file reading
Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851
2024-05-01 00:50:23 +02:00
8d77cc4133 Change path of the obitools pkg
Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65
2023-11-29 12:14:37 +01:00
677775bf04 Patch a bug in the fasq reader allowing to read only lower case sequence files
Former-commit-id: f19f3812f48b215cdc5d0fdb8a39f0cd4bb5289b
2023-11-10 09:10:54 +01:00
61c30f9b6a Patch rev complement and first implementation of --auto in obicsv
Former-commit-id: f3020e81283b1073c4d1c2d2ff0887e3998e6764
2023-11-07 09:37:07 +02:00
e8c55a2b6b optimize sequence readers and patch a bug in the format guesser
Former-commit-id: 9dce1e96c57ae9a88c26fac5c8e1bdcdc2c0c7a5
2023-10-13 21:52:57 +02:00
157c26cdc7 Patch a bug in the fasta and fastq readers
Former-commit-id: 4998f157a90a6b077124d87d4a5cde0dd075d1ce
2023-10-13 14:21:27 +02:00
d23a911080 Change the way sequence definition are managed. They are now when present stored as an attribute
Former-commit-id: 6e618377c05b42937d2eace3c9668390980ab68c
2023-10-05 07:21:12 +02:00
5c30ec354f Go implementation of fasta and fastq parser
Former-commit-id: 3f4fd355c169afbef2d5fef1f5e407aabb71d031
2023-09-03 19:16:37 +02:00