Eric Coissac
1342c83db6
Use NewBioSequenceOwning to avoid unnecessary sequence copying
...
Replace NewBioSequence with NewBioSequenceOwning in genbank_read.go to take ownership of sequence slices without copying, improving performance. Update biosequence.go to add the new TakeSequence method and NewBioSequenceOwning constructor.
2026-03-10 15:51:35 +01:00
Eric Coissac
b246025907
Optimize Fasta batch formatting
...
Optimize FormatFastaBatch to pre-allocate buffer and write sequences directly without intermediate strings, improving performance and memory usage.
2026-03-10 15:43:59 +01:00
Eric Coissac
761e0dbed3
Implémentation d'un parseur GenBank utilisant rope pour réduire l'usage de mémoire
...
Ajout d'un parseur GenBank basé sur rope pour réduire l'usage de mémoire (RSS) et les allocations heap.
- Ajout de `gbRopeScanner` pour lire les lignes sans allocation heap
- Implémentation de `GenbankChunkParserRope` qui utilise rope au lieu de `Pack()`
- Modification de `_ParseGenbankFile` et `ReadGenbank` pour utiliser le nouveau parseur
- Réduction du RSS attendue de 57 GB à ~128 MB × workers
- Conservation de l'ancien parseur pour compatibilité et tests
Réduction significative des allocations (~50M) et temps sys, avec un temps user comparable ou meilleur.
2026-03-10 15:35:36 +01:00
Eric Coissac
a7ea47624b
Optimisation du parsing des grandes séquences
...
Implémente une optimisation du parsing des grandes séquences en évitant l'allocation de mémoire inutile lors de la fusion des chunks. Ajoute un support pour le parsing direct de la structure rope, ce qui permet de réduire les allocations et d'améliorer les performances lors du traitement de fichiers GenBank/EMBL et FASTA/FASTQ de plusieurs Gbp. Les parseurs sont mis à jour pour utiliser la rope non-packée et le nouveau mécanisme d'écriture in-place pour les séquences GenBank.
2026-03-10 14:20:21 +01:00
Eric Coissac
c57e788459
Fix GenBank parsing and add release notes script
...
This commit fixes an issue in the GenBank parser where empty parts were being included in the parsed data. It also introduces a new script `release_notes.sh` to automate the generation of GitHub-compatible release notes for OBITools4 versions, including support for LLM summarization and various output modes.
2026-02-20 11:37:51 +01:00
Eric Coissac
cef29005a5
debug url reading
2025-11-18 15:30:20 +01:00
Eric Coissac
6d204f6281
Patch the fastq detector
2025-08-08 10:23:03 -04:00
Eric Coissac
43b285587e
Debug on taxonomy extraction and CSV conversion
2025-07-07 15:29:40 +02:00
Eric Coissac
8d53d253d4
Add a reading option on readers to convet U to T
2025-07-07 15:29:07 +02:00
Eric Coissac
9965370d85
Manage a lock on StatsOnValues
2025-06-17 16:46:11 +02:00
Eric Coissac
38dcd98d4a
Patch the genbank parser automata
2025-06-17 08:52:45 +02:00
Eric Coissac
6cb7a5a352
Changes to be committed:
...
modified: cmd/obitools/obitag/main.go
modified: cmd/obitools/obitaxonomy/main.go
modified: pkg/obiformats/csvtaxdump_read.go
modified: pkg/obiformats/ecopcr_read.go
modified: pkg/obiformats/ncbitaxdump_read.go
modified: pkg/obiformats/ncbitaxdump_readtar.go
modified: pkg/obiformats/newick_write.go
modified: pkg/obiformats/options.go
modified: pkg/obiformats/taxonomy_read.go
modified: pkg/obiformats/universal_read.go
modified: pkg/obiiter/extract_taxonomy.go
modified: pkg/obioptions/options.go
modified: pkg/obioptions/version.go
new file: pkg/obiphylo/tree.go
modified: pkg/obiseq/biosequenceslice.go
modified: pkg/obiseq/taxonomy_methods.go
modified: pkg/obitax/taxonomy.go
modified: pkg/obitax/taxonset.go
modified: pkg/obitools/obiconvert/sequence_reader.go
modified: pkg/obitools/obitag/obitag.go
modified: pkg/obitools/obitaxonomy/obitaxonomy.go
modified: pkg/obitools/obitaxonomy/options.go
deleted: sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac
3424d3057f
Changes to be committed:
...
modified: pkg/obiformats/ngsfilter_read.go
modified: pkg/obioptions/version.go
modified: pkg/obiutils/mimetypes.go
2025-05-14 14:53:25 +02:00
Eric Coissac
2d52322876
Patch a bug in the obi2 annotation parser on map indexed by integers
2025-03-27 14:54:13 +01:00
Eric Coissac
5a3705b6bb
Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands.
2025-03-25 16:44:46 +01:00
Eric Coissac
8448783499
Make sequence files recognized as a taxonomy
2025-03-14 14:22:22 +01:00
Eric Coissac
3a1cf4fe97
Accelerate the speed of very long fasta sequences, and more generaly of every format
2025-03-12 13:29:41 +01:00
Eric Coissac
0339e4dffa
Patch size limite of the filetype guesser
2025-03-08 07:34:02 +01:00
Eric Coissac
0067152c2b
Patch the production of the ratio file
2025-02-27 10:19:39 +01:00
Eric Coissac
4774438644
Changes to be committed:
...
modified: pkg/obiformats/universal_read.go
modified: pkg/obioptions/version.go
modified: pkg/obiseq/taxonomy_methods.go
2025-02-12 08:40:38 +01:00
Eric Coissac
6a8061cc4f
Add managment of the taxonomy alias politic
2025-02-10 14:05:47 +01:00
Eric Coissac
0df082da06
Adds possibility to extract a taxonomy from taxonomic path included in sequence files
2025-01-30 11:18:21 +01:00
Eric Coissac
7c4042df6b
introduce obidefault
2025-01-27 17:12:45 +01:00
Eric Coissac
9acb4a85a8
Refactoring of the default values
2025-01-24 18:09:59 +01:00
Eric Coissac
3137c1f841
Adds the ability to read gzip-tar file for the taxonomy dump
2025-01-24 11:47:59 +01:00
Eric Coissac
4fe0db63ff
Patch CSV reader to use the new taxonomy system
2024-12-20 21:30:00 +01:00
Eric Coissac
ccd3b06532
Merge branch 'master' into taxonomy
2024-12-20 20:06:57 +01:00
Eric Coissac
5d0f996625
Patch a small bug on json write
2024-12-20 19:42:03 +01:00
Eric Coissac
795df34d1a
Changes to be committed:
...
modified: cmd/obitools/obitag/main.go
modified: cmd/obitools/obitag2/main.go
modified: go.mod
modified: go.sum
modified: pkg/obiformats/ncbitaxdump/read.go
modified: pkg/obioptions/version.go
modified: pkg/obiseq/attributes.go
modified: pkg/obiseq/taxonomy_lca.go
modified: pkg/obiseq/taxonomy_methods.go
modified: pkg/obiseq/taxonomy_predicate.go
modified: pkg/obitax/inner.go
modified: pkg/obitax/lca.go
new file: pkg/obitax/taxid.go
modified: pkg/obitax/taxon.go
modified: pkg/obitax/taxonomy.go
modified: pkg/obitax/taxonslice.go
modified: pkg/obitools/obicleandb/obicleandb.go
modified: pkg/obitools/obigrep/options.go
modified: pkg/obitools/obilandmark/obilandmark.go
modified: pkg/obitools/obilandmark/options.go
modified: pkg/obitools/obirefidx/famlilyindexing.go
modified: pkg/obitools/obirefidx/geomindexing.go
modified: pkg/obitools/obirefidx/obirefidx.go
modified: pkg/obitools/obirefidx/options.go
modified: pkg/obitools/obitag/obigeomtag.go
modified: pkg/obitools/obitag/obitag.go
modified: pkg/obitools/obitag/options.go
modified: pkg/obiutils/strings.go
2024-12-19 13:36:59 +01:00
Eric Coissac
f41a6fbb60
Patch a small bug on json write
2024-11-29 18:39:18 +01:00
Eric Coissac
00b0edc15a
refactoring of the file chunck writing
2024-11-29 18:15:03 +01:00
Eric Coissac
40fb4e9767
reduce the memory impact of obiuniq.
2024-11-27 13:30:16 +01:00
Eric Coissac
3d06978808
a functional new version of obifind
2024-11-24 19:33:24 +01:00
Eric Coissac
7633fc4d23
update documentation
2024-11-16 06:00:27 +01:00
Eric Coissac
03f4e88a17
Fisrt functional version
2024-11-14 19:10:23 +01:00
Eric Coissac
3e00d39d47
In obimultiplex, patch a bug when no tag are associated to a primer.
2024-10-22 14:12:20 +02:00
Eric Coissac
9e8a7fd9be
Patch a bug in fastq reader
2024-10-20 16:07:43 +02:00
Eric Coissac
05bf2bfd6c
Add option related to agrep match on obigrep and obiannotate
2024-09-09 16:52:13 +02:00
Eric Coissac
65ae82622e
correction of several small bugs
2024-09-03 06:08:07 -03:00
Eric Coissac
31bfc88eb9
Patch a bug on writing to stdout, and add clearer error on openning data files
2024-08-13 09:45:28 +02:00
Eric Coissac
3f57935328
Adjust the size of the genbank and embl buffer size
2024-08-05 11:32:37 +02:00
Eric Coissac
886b5d9a96
Optimize memory for readers and writers
2024-08-05 10:48:28 +02:00
Eric Coissac
1b1cd41fd3
Add some code refactoring from the blackboard branch
2024-08-02 12:35:46 +02:00
Eric Coissac
67665a6b40
Xprize update
...
Former-commit-id: d38919a897961e4d40da3b844057c3fb94fdb6d7
2024-07-25 18:09:03 -04:00
Eric Coissac
bd855c4965
Adds CSV as an input format
...
Former-commit-id: a365bb6947064adc2709d66df05fa54c6fe47fad
2024-07-03 21:04:27 +02:00
Eric Coissac
154753de90
Same bug but for fastq sequence writing only
...
Former-commit-id: 86d208fe66828da9943c559df80ff095b07eaf7a
2024-06-26 18:46:47 +02:00
Eric Coissac
e40d0bfbe7
Debug fasta and fastq writer when the first sequence is hudge
...
Former-commit-id: d208ff838abb7e19e117067f6243298492d60f14
2024-06-26 18:39:42 +02:00
Eric Coissac
c1f03cb1f6
Switch to faster json library go-json and sonic
...
Former-commit-id: ab9b4723f1dcf79fe5c073fff4d86f4f6969edfd
2024-06-23 00:36:08 +02:00
Eric Coissac
93f9dcb95f
Reducing memory allocation events
...
Former-commit-id: c94e79ba116464504580fc397270ead154063971
2024-06-22 22:32:31 +02:00
Eric Coissac
e6b87ecd02
Reduce memory allocation events
...
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00