Eric Coissac
623116ab13
Add rope-based FASTA parsing and improve sequence handling
...
Introduce FastaChunkParserRope for direct rope-based FASTA parsing, enhance sequence extraction with whitespace skipping and U->T conversion, and update parser logic to support both rope and raw data sources.
- Added extractFastaSeq function to scan sequence bytes directly from rope
- Implemented FastaChunkParserRope for rope-based parsing
- Modified _ParseFastaFile to use rope when available
- Updated sequence handling to support U->T conversion
- Fixed line ending detection for FASTA parsing
2026-03-10 16:34:33 +01:00
Eric Coissac
a7ea47624b
Optimisation du parsing des grandes séquences
...
Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.
2026-03-10 14:20:21 +01:00
Eric Coissac
8d53d253d4
Add a reading option on readers to convet U to T
2025-07-07 15:29:07 +02:00
Eric Coissac
5a3705b6bb
Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands.
2025-03-25 16:44:46 +01:00
Eric Coissac
3a1cf4fe97
Accelerate the speed of very long fasta sequences, and more generaly of every format
2025-03-12 13:29:41 +01:00
Eric Coissac
3137c1f841
Adds the ability to read gzip-tar file for the taxonomy dump
2025-01-24 11:47:59 +01:00
Eric Coissac
00b0edc15a
refactoring of the file chunck writing
2024-11-29 18:15:03 +01:00
Eric Coissac
886b5d9a96
Optimize memory for readers and writers
2024-08-05 10:48:28 +02:00
Eric Coissac
1b1cd41fd3
Add some code refactoring from the blackboard branch
2024-08-02 12:35:46 +02:00
Eric Coissac
e6b87ecd02
Reduce memory allocation events
...
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac
54a138196c
Patch a bug in fasta and fastq reading
...
Former-commit-id: bcaa264b4c4a7c67617eb909b199176bf09913db
2024-06-21 14:28:57 +02:00
Eric Coissac
aa42df326a
Correct a bug in the fastq reader affecting the quality of the last record of each chunk
...
Former-commit-id: b842d60af9c2f1f971946d99999d13cfc15793b3
2024-06-04 11:57:16 +02:00
Eric Coissac
3e1d9a41ec
Adds more format checking during fasta parsing
...
Former-commit-id: fbc3d9c923936287a591f01f9401b710b584aa14
2024-05-30 18:12:06 +02:00
Eric Coissac
98b3bc2a8c
Patch a bug on the reading of each last sequence of a chunck in the fasta reader
...
Former-commit-id: eacf64112582befa4751f66352999a28abf349f7
2024-05-27 10:17:17 +02:00
Eric Coissac
55ce36f329
Update of obipcr and homogenization of logging
...
Former-commit-id: 46abf47c19ace5248042c02cf1f81d9f6c12eb10
2024-05-16 15:18:30 +02:00
Eric Coissac
9e63013bc2
Correction on obiformat of bug leading to partial parsing and add godocs
...
Former-commit-id: b27105355f1a330eedf6eaa72c8ac94f06806c28
2024-05-07 10:54:12 +02:00
Eric Coissac
5b98393a68
Refactor sequence file reading
...
Former-commit-id: 3dcb96e68da648d72bb585da047e3496427d7851
2024-05-01 00:50:23 +02:00
8d77cc4133
Change path of the obitools pkg
...
Former-commit-id: 311cbf8df3b990b393c6f4885d62e74564423b65
2023-11-29 12:14:37 +01:00
61c30f9b6a
Patch rev complement and first implementation of --auto in obicsv
...
Former-commit-id: f3020e81283b1073c4d1c2d2ff0887e3998e6764
2023-11-07 09:37:07 +02:00
6a6a6f6f2c
Correctly handle empty files
...
Former-commit-id: d166aa352ce4bf32739ddc2f7d1c9967918822fd
2023-10-16 15:34:06 +02:00
e8c55a2b6b
optimize sequence readers and patch a bug in the format guesser
...
Former-commit-id: 9dce1e96c57ae9a88c26fac5c8e1bdcdc2c0c7a5
2023-10-13 21:52:57 +02:00
157c26cdc7
Patch a bug in the fasta and fastq readers
...
Former-commit-id: 4998f157a90a6b077124d87d4a5cde0dd075d1ce
2023-10-13 14:21:27 +02:00
d23a911080
Change the way sequence definition are managed. They are now when present stored as an attribute
...
Former-commit-id: 6e618377c05b42937d2eace3c9668390980ab68c
2023-10-05 07:21:12 +02:00
5c30ec354f
Go implementation of fasta and fastq parser
...
Former-commit-id: 3f4fd355c169afbef2d5fef1f5e407aabb71d031
2023-09-03 19:16:37 +02:00
6d1ac60c48
Convert first nucleotide of sequence to lower case in fasta reader
...
Former-commit-id: 856bb3a39a4f1143a34b1f8b4d8d12b0151c0c3e
2023-09-01 09:40:02 +02:00
62b57f4ede
A go implementation of the fasta reader
...
Former-commit-id: 603592c4761fb0722e9e0501d78de1bd3ba238fa
2023-09-01 09:30:12 +02:00