Commit Graph

135 Commits

Author SHA1 Message Date
Eric Coissac a2b26712b2 refactor: replace fixed batch size with dynamic flushing based on count and memory
Replace the old fixed batch-size mechanism in Distribute with a dynamic strategy that flushes batches when either BatchSizeMax() sequences or BatchMem() bytes are reached per key. This aligns with the RebatchBySize strategy and removes the optional sizes parameter. Also update related code: simplify Lua wrapper to accept optional capacity, and fix buffer growth logic in worker.go using slices.Grow correctly. Remove unused BatchSize() usage from obidistribute.
2026-03-16 22:06:44 +01:00
Eric Coissac 40769bf827 Add memory-based batching support
Implement memory-aware batch sizing with --batch-mem CLI option, enabling adaptive batching based on estimated sequence memory footprint. Key changes:
- Added _BatchMem and related getters/setters in pkg/obidefault
- Implemented RebatchBySize() in pkg/obiter for memory-constrained batching
- Added BioSequence.MemorySize() for conservative memory estimation
- Integrated batch-mem option in pkg/obioptions with human-readable size parsing (e.g., 128K, 64M, 1G)
- Added obiutils.ParseMemSize/FormatMemSize for unit conversion
- Enhanced pool GC in pkg/obiseq/pool.go to trigger explicit GC for large slice discards
- Updated sequence_reader.go to apply memory-based rebatching when enabled
2026-03-13 14:54:21 +01:00
Eric Coissac 6ee8750635 Replace SplitInTwo with LeftSplitInTwo/RightSplitInTwo for precise splitting
Replace SplitInTwo calls with LeftSplitInTwo or RightSplitInTwo depending on the intended split direction. In fastseq_json_header.go, extract rank from suffix without splitting; in biosequenceslice.go and taxid.go, use LeftSplitInTwo to split from the left; add RightSplitInTwo utility function for splitting from the right.
2026-03-12 18:41:28 +01:00
Eric Coissac 3d2e205722 Refactor rope scanner and add FASTQ rope parser
This commit refactors the rope scanner implementation by renaming gbRopeScanner to ropeScanner and extracting the common functionality into a new file. It also introduces a new FastqChunkParserRope function that parses FASTQ chunks directly from a rope without Pack(), enabling more efficient memory usage. The existing parsers are updated to use the new rope-based parser when available. The BioSequence type is enhanced with a TakeQualities method for more efficient quality data handling.
2026-03-10 16:47:03 +01:00
Eric Coissac 1342c83db6 Use NewBioSequenceOwning to avoid unnecessary sequence copying
Replace NewBioSequence with NewBioSequenceOwning in genbank_read.go to take ownership of sequence slices without copying, improving performance. Update biosequence.go to add the new TakeSequence method and NewBioSequenceOwning constructor.
2026-03-10 15:51:35 +01:00
Eric Coissac ac0d3f3fe4 Update obiuniq for very large dataset 2025-12-18 14:11:11 +01:00
Eric Coissac 86e60aedd0 obicsv bug with stat on value map fields 2025-11-21 14:03:31 +01:00
Eric Coissac 4603d7973e implementation de obilowmask 2025-11-18 15:30:20 +01:00
Eric Coissac d17a9520b9 work on obiclean chimera detection 2025-10-20 17:29:47 +02:00
Eric Coissac add9d89ccc Patch the Min and Max values of the expression language 2025-06-19 16:43:26 +02:00
Eric Coissac 9965370d85 Manage a lock on StatsOnValues 2025-06-17 16:46:11 +02:00
Eric Coissac 8a2bb1fe82 Changes to be committed:
modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/merge.go
2025-06-17 12:11:35 +02:00
Eric Coissac efc3f3af29 Patch a concurrent access problem 2025-06-17 12:05:42 +02:00
Eric Coissac 6cb7a5a352 Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitaxonomy/main.go
	modified:   pkg/obiformats/csvtaxdump_read.go
	modified:   pkg/obiformats/ecopcr_read.go
	modified:   pkg/obiformats/ncbitaxdump_read.go
	modified:   pkg/obiformats/ncbitaxdump_readtar.go
	modified:   pkg/obiformats/newick_write.go
	modified:   pkg/obiformats/options.go
	modified:   pkg/obiformats/taxonomy_read.go
	modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obiiter/extract_taxonomy.go
	modified:   pkg/obioptions/options.go
	modified:   pkg/obioptions/version.go
	new file:   pkg/obiphylo/tree.go
	modified:   pkg/obiseq/biosequenceslice.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonset.go
	modified:   pkg/obitools/obiconvert/sequence_reader.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitaxonomy/obitaxonomy.go
	modified:   pkg/obitools/obitaxonomy/options.go
	deleted:    sample/.DS_Store
2025-06-04 09:48:10 +02:00
Eric Coissac f9324dd8f4 add min and max to the obitools expression language 2025-05-13 16:03:03 +02:00
Eric Coissac f1b9ac4a13 Update the expression language 2025-05-07 20:45:05 +02:00
Eric Coissac c0ecaf90ab Add the --number option to obiannotate 2025-04-22 18:35:51 +02:00
Eric Coissac a57cfda675 Make the replace function of the eval language accepting regex 2025-04-10 15:17:15 +02:00
Eric Coissac 5a3705b6bb Adds the --silent-warning options to the obitools commands and removes the --pared-with option from some of the obitols commands. 2025-03-25 16:44:46 +01:00
Eric Coissac f21f51ae62 Correct the logic of --update-taxid and --fail-on-taxonomy 2025-03-11 16:56:02 +01:00
Eric Coissac 3b5d4ba455 patch a bug in obiannotate 2025-03-11 16:35:38 +01:00
Eric Coissac 286e27d6ba patch the scienctific_name tag name to "scientific_name" 2025-03-05 14:22:12 +01:00
Eric Coissac 51b3e83d32 some cleaning 2025-02-24 11:31:49 +01:00
Eric Coissac 8671285d02 add the --min-sample-count option to obiclean. 2025-02-24 08:48:31 +01:00
Eric Coissac 4774438644 Changes to be committed:
modified:   pkg/obiformats/universal_read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/taxonomy_methods.go
2025-02-12 08:40:38 +01:00
Eric Coissac 6a8061cc4f Add managment of the taxonomy alias politic 2025-02-10 14:05:47 +01:00
Eric Coissac 0df082da06 Adds possibility to extract a taxonomy from taxonomic path included in sequence files 2025-01-30 11:18:21 +01:00
Eric Coissac 9acb4a85a8 Refactoring of the default values 2025-01-24 18:09:59 +01:00
Eric Coissac ccd3b06532 Merge branch 'master' into taxonomy 2024-12-20 20:06:57 +01:00
Eric Coissac 5d0f996625 Patch a small bug on json write 2024-12-20 19:42:03 +01:00
Eric Coissac 795df34d1a Changes to be committed:
modified:   cmd/obitools/obitag/main.go
	modified:   cmd/obitools/obitag2/main.go
	modified:   go.mod
	modified:   go.sum
	modified:   pkg/obiformats/ncbitaxdump/read.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/attributes.go
	modified:   pkg/obiseq/taxonomy_lca.go
	modified:   pkg/obiseq/taxonomy_methods.go
	modified:   pkg/obiseq/taxonomy_predicate.go
	modified:   pkg/obitax/inner.go
	modified:   pkg/obitax/lca.go
	new file:   pkg/obitax/taxid.go
	modified:   pkg/obitax/taxon.go
	modified:   pkg/obitax/taxonomy.go
	modified:   pkg/obitax/taxonslice.go
	modified:   pkg/obitools/obicleandb/obicleandb.go
	modified:   pkg/obitools/obigrep/options.go
	modified:   pkg/obitools/obilandmark/obilandmark.go
	modified:   pkg/obitools/obilandmark/options.go
	modified:   pkg/obitools/obirefidx/famlilyindexing.go
	modified:   pkg/obitools/obirefidx/geomindexing.go
	modified:   pkg/obitools/obirefidx/obirefidx.go
	modified:   pkg/obitools/obirefidx/options.go
	modified:   pkg/obitools/obitag/obigeomtag.go
	modified:   pkg/obitools/obitag/obitag.go
	modified:   pkg/obitools/obitag/options.go
	modified:   pkg/obiutils/strings.go
2024-12-19 13:36:59 +01:00
Eric Coissac 00b0edc15a refactoring of the file chunck writing 2024-11-29 18:15:03 +01:00
Eric Coissac d29a56dcbf Changes to be committed:
modified:   Release-notes.md
	modified:   pkg/obialign/pairedendalign.go
	modified:   pkg/obilua/obiseq.go
	modified:   pkg/obioptions/version.go
	modified:   pkg/obiseq/biosequence.go
	modified:   pkg/obitools/obipairing/pairing.go
2024-11-27 09:56:22 +01:00
Eric Coissac 7884a74f9c Patch a bug in obitagpcr 2024-11-18 21:10:47 +01:00
Eric Coissac 03f4e88a17 Fisrt functional version 2024-11-14 19:10:23 +01:00
Eric Coissac 241f2286f2 remove the slice pool management 2024-09-24 16:31:30 +02:00
Eric Coissac 05bf2bfd6c Add option related to agrep match on obigrep and obiannotate 2024-09-09 16:52:13 +02:00
Eric Coissac 65ae82622e correction of several small bugs 2024-09-03 06:08:07 -03:00
Eric Coissac bdb96dda94 Adds the obimicrosat command 2024-08-05 15:31:20 +02:00
Eric Coissac 67665a6b40 Xprize update
Former-commit-id: d38919a897961e4d40da3b844057c3fb94fdb6d7
2024-07-25 18:09:03 -04:00
Eric Coissac 4e4fac491f Fisrt versin of the two levels indexing
Former-commit-id: 4d86483bc120e27cb6f5d2c216596d410274fc69
2024-07-12 15:17:48 +02:00
Eric Coissac c7ed47e110 first version of obidemerge, obijoin and a new filter for obicleandb but to be finnished
Former-commit-id: 8a1ed26e5548c30db75644c294d478ec4d753f19
2024-07-10 15:21:42 +02:00
Eric Coissac bd855c4965 Adds CSV as an input format
Former-commit-id: a365bb6947064adc2709d66df05fa54c6fe47fad
2024-07-03 21:04:27 +02:00
Eric Coissac fd663357b5 First version of obicleandb...
Former-commit-id: e60b61d015abbf029a555b51de99b4252c50ab59
2024-07-01 17:12:42 +02:00
Eric Coissac 93f9dcb95f Reducing memory allocation events
Former-commit-id: c94e79ba116464504580fc397270ead154063971
2024-06-22 22:32:31 +02:00
Eric Coissac e6b87ecd02 Reduce memory allocation events
Former-commit-id: fbdb2afc857b02adc2593e2278d3bd838e99b0b2
2024-06-22 21:01:53 +02:00
Eric Coissac 54a138196c Patch a bug in fasta and fastq reading
Former-commit-id: bcaa264b4c4a7c67617eb909b199176bf09913db
2024-06-21 14:28:57 +02:00
Eric Coissac 818ce87bab Patch some bugs in writing files
Former-commit-id: 612868a281dc0ecf4e6c5776973735e5c71bd517
2024-06-19 13:15:30 +02:00
Eric Coissac 65f5109957 Plenty of small bugs
Former-commit-id: 42c7fab7d65906c80ab4cd32da6867ff21842ea8
2024-06-04 16:49:12 +02:00
Eric Coissac aa42df326a Correct a bug in the fastq reader affecting the quality of the last record of each chunk
Former-commit-id: b842d60af9c2f1f971946d99999d13cfc15793b3
2024-06-04 11:57:16 +02:00